April 1, 2020

3307 words 16 mins read

Paper Group NANR 45

Recurrent Event Network : Global Structure Inference Over Temporal Knowledge Graph. IsoNN: Isomorphic Neural Network for Graph Representation Learning and Classification. Subgraph Attention for Node Classification and Hierarchical Graph Pooling. Empowering Graph Representation Learning with Paired Training and Graph Co-Attention. MxPool: Multiplex …

Recurrent Event Network : Global Structure Inference Over Temporal Knowledge Graph


Title	Recurrent Event Network : Global Structure Inference Over Temporal Knowledge Graph
Authors	Anonymous
Abstract	Modeling dynamically-evolving, multi-relational graph data has received a surge of interests with the rapid growth of heterogeneous event data. However, predicting future events on such data requires global structure inference over time and the ability to integrate temporal and structural information, which are not yet well understood. We present Recurrent Event Network (RE-Net), a novel autoregressive architecture for modeling temporal sequences of multi-relational graphs (e.g., temporal knowledge graph), which can perform sequential, global structure inference over future time stamps to predict new events. RE-Net employs a recurrent event encoder to model the temporally conditioned joint probability distribution for the event sequences, and equips the event encoder with a neighborhood aggregator for modeling the concurrent events within a time window associated with each entity. We apply teacher forcing for model training over historical data, and infer graph sequences over future time stamps by sampling from the learned joint distribution in a sequential manner. We evaluate the proposed method via temporal link prediction on ﬁve public datasets. Extensive experiments demonstrate the strength of RE-Net, especially on multi-step inference over future time stamps.
Tasks	Link Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=SyeyF0VtDr
PDF	https://openreview.net/pdf?id=SyeyF0VtDr
PWC	https://paperswithcode.com/paper/recurrent-event-network-global-structure
Repo
Framework

IsoNN: Isomorphic Neural Network for Graph Representation Learning and Classification


Title	IsoNN: Isomorphic Neural Network for Graph Representation Learning and Classification
Authors	Anonymous
Abstract	Deep learning models have achieved huge success in numerous fields, such as computer vision and natural language processing. However, unlike such fields, it is hard to apply traditional deep learning models on the graph data due to the ‘node-orderless’ property. Normally, adjacency matrices will cast an artificial and random node-order on the graphs, which renders the performance of deep mod- els on graph classification tasks extremely erratic, and the representations learned by such models lack clear interpretability. To eliminate the unnecessary node- order constraint, we propose a novel model named Isomorphic Neural Network (ISONN), which learns the graph representation by extracting its isomorphic features via the graph matching between input graph and templates. ISONN has two main components: graph isomorphic feature extraction component and classification component. The graph isomorphic feature extraction component utilizes a set of subgraph templates as the kernel variables to learn the possible subgraph patterns existing in the input graph and then computes the isomorphic features. A set of permutation matrices is used in the component to break the node-order brought by the matrix representation. Three fully-connected layers are used as the classification component in ISONN. Extensive experiments are conducted on benchmark datasets, the experimental results can demonstrate the effectiveness of ISONN, especially compared with both classic and state-of-the-art graph classification methods.
Tasks	Graph Classification, Graph Matching, Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rylvAA4YDB
PDF	https://openreview.net/pdf?id=rylvAA4YDB
PWC	https://paperswithcode.com/paper/isonn-isomorphic-neural-network-for-graph-1
Repo
Framework

Subgraph Attention for Node Classification and Hierarchical Graph Pooling


Title	Subgraph Attention for Node Classification and Hierarchical Graph Pooling
Authors	Anonymous
Abstract	Graph neural networks have gained significant interest from the research community for both node classification within a graph and graph classification within a set of graphs. Attention mechanism applied on the neighborhood of a node improves the performance of graph neural networks. Typically, it helps to identify a neighbor node which plays more important role to determine the label of the node under consideration. But in real world scenarios, a particular subset of nodes together, but not the individual nodes in the subset, may be important to determine the label of a node. To address this problem, we introduce the concept of subgraph attention for graphs. To show the efficiency of this, we use subgraph attention with graph convolution for node classification. We further use subgraph attention for the entire graph classification by proposing a novel hierarchical neural graph pooling architecture. Along with attention over the subgraphs, our pooling architecture also uses attention to determine the important nodes within a level graph and attention to determine the important levels in the whole hierarchy. Competitive performance over the state-of-the-arts for both node and graph classification shows the efficiency of the algorithms proposed in this paper.
Tasks	Graph Classification, Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=H1e552VKPr
PDF	https://openreview.net/pdf?id=H1e552VKPr
PWC	https://paperswithcode.com/paper/subgraph-attention-for-node-classification
Repo
Framework

Empowering Graph Representation Learning with Paired Training and Graph Co-Attention


Title	Empowering Graph Representation Learning with Paired Training and Graph Co-Attention
Authors	Anonymous
Abstract	Through many recent advances in graph representation learning, performance achieved on tasks involving graph-structured data has substantially increased in recent years—mostly on tasks involving node-level predictions. The setup of prediction tasks over entire graphs (such as property prediction for a molecule, or side-effect prediction for a drug), however, proves to be more challenging, as the algorithm must combine evidence about several structurally relevant patches of the graph into a single prediction. Most prior work attempts to predict these graph-level properties while considering only one graph at a time—not allowing the learner to directly leverage structural similarities and motifs across graphs. Here we propose a setup in which a graph neural network receives pairs of graphs at once, and extend it with a co-attentional layer that allows node representations to easily exchange structural information across them. We first show that such a setup provides natural benefits on a pairwise graph classification task (drug-drug interaction prediction), and then expand to a more generic graph regression setup: enhancing predictions over QM9, a standard molecular prediction benchmark. Our setup is flexible, powerful and makes no assumptions about the underlying dataset properties, beyond anticipating the existence of multiple training graphs.
Tasks	Graph Classification, Graph Regression, Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BJeRykBKDH
PDF	https://openreview.net/pdf?id=BJeRykBKDH
PWC	https://paperswithcode.com/paper/empowering-graph-representation-learning-with
Repo
Framework

MxPool: Multiplex Pooling for Hierarchical Graph Representation Learning


Title	MxPool: Multiplex Pooling for Hierarchical Graph Representation Learning
Authors	Yanyan Liang, Yanfeng Zhang, Fangjing Wang, Qian Xu
Abstract	Graphs are known to have complicated structures and have myriad applications. How to utilize deep learning methods for graph classification tasks has attracted considerable research attention in the past few years. Two properties of graph data have imposed significant challenges on existing graph learning techniques. (1) Diversity: each graph has a variable size of unordered nodes and diverse node/edge types. (2) Complexity: graphs have not only node/edge features but also complex topological features. These two properties motivate us to use multiplex structure to learn graph features in a diverse way. In this paper, we propose a simple but effective approach, MxPool, which concurrently uses multiple graph convolution networks and graph pooling networks to build hierarchical learning structure for graph representation learning tasks. Our experiments on numerous graph classification benchmarks show that our MxPool has marked superiority over other state-of-the-art graph representation learning methods. For example, MxPool achieves 92.1% accuracy on the D&D dataset while the second best method DiffPool only achieves 80.64% accuracy.
Tasks	Graph Classification, Graph Representation Learning, Representation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rke3U6NtwH
PDF	https://openreview.net/pdf?id=rke3U6NtwH
PWC	https://paperswithcode.com/paper/mxpool-multiplex-pooling-for-hierarchical
Repo
Framework

Contextual Text Style Transfer


Title	Contextual Text Style Transfer
Authors	Anonymous
Abstract	In this paper, we introduce a new task, Contextual Text Style Transfer, to translate a sentence within a paragraph context into the desired style (e.g., informal to formal, offensive to non-offensive). Two new datasets, Enron-Context and Reddit-Context, are introduced for this new task, focusing on formality and offensiveness, respectively. Two key challenges exist in contextual text style transfer: 1) how to preserve the semantic meaning of the target sentence and its consistency with the surrounding context when generating an alternative sentence with a specific style; 2) how to deal with the lack of labeled parallel data. To address these challenges, we propose a Context-Aware Style Transfer (CAST) model, which leverages both parallel and non-parallel data for joint model training. For parallel training data, CAST uses two separate encoders to encode each input sentence and its surrounding context, respectively. The encoded feature vector, together with the target style information, are then used to generate the target sentence. A classifier is further used to ensure contextual consistency of the generated sentence. In order to lever-age massive non-parallel corpus and to enhance sentence encoder and decoder training, additional self-reconstruction and back-translation losses are introduced. Experimental results on Enron-Context and Reddit-Context demonstrate the effectiveness of the proposed model over state-of-the-art style transfer methods, across style accuracy, content preservation, and contextual consistency metrics.
Tasks	Style Transfer, Text Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=HkeJzANFwS
PDF	https://openreview.net/pdf?id=HkeJzANFwS
PWC	https://paperswithcode.com/paper/contextual-text-style-transfer
Repo
Framework

Impact of the latent space on the ability of GANs to fit the distribution


Title	Impact of the latent space on the ability of GANs to fit the distribution
Authors	Anonymous
Abstract	The goal of generative models is to model the underlying data distribution of a sample based dataset. Our intuition is that an accurate model should in principle also include the sample based dataset as part of its induced probability distribution. To investigate this, we look at fully trained generative models using the Generative Adversarial Networks (GAN) framework and analyze the resulting generator on its ability to memorize the dataset. Further, we show that the size of the initial latent space is paramount to allow for an accurate reconstruction of the training data. This gives us a link to compression theory, where Autoencoders (AE) are used to lower bound the reconstruction capabilities of our generative model. Here, we observe similar results to the perception-distortion tradeoff (Blau & Michaeli (2018)). Given a small latent space, the AE produces low quality and the GAN produces high quality outputs from a perceptual viewpoint. In contrast, the distortion error is smaller for the AE. By increasing the dimensionality of the latent space the distortion decreases for both models, but the perceptual quality only increases for the AE.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hygy01StvH
PDF	https://openreview.net/pdf?id=Hygy01StvH
PWC	https://paperswithcode.com/paper/impact-of-the-latent-space-on-the-ability-of
Repo
Framework

GOING BEYOND TOKEN-LEVEL PRE-TRAINING FOR EMBEDDING-BASED LARGE-SCALE RETRIEVAL


Title	GOING BEYOND TOKEN-LEVEL PRE-TRAINING FOR EMBEDDING-BASED LARGE-SCALE RETRIEVAL
Authors	Anonymous
Abstract	We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two steps. The retrieval phase first reduces the solution space, returning a subset of candidate documents. The scoring phase then scores and re-ranks the documents. The algorithm used in the retrieval phase is critical. On the one hand, it needs to have high recall – otherwise some relevant documents won’t even be considered in the scoring phase. On the other hand, it needs to be highly efficient, returning the candidate documents in time sublinear to the total number of documents. Unlike the scoring phase which witnessed significant advances recently due to the BERT-style cross-attention models, the retrieval phase remains less well studied: most previous works rely on the classic Information Retrieval (IR) methods such as BM-25 (token matching + TF-IDF weights). In this paper, we conduct a comprehensive study on different retrieval algorithms and show that the two-tower Transformer models with properly designed pre-training tasks can largely improve over the widely used BM-25 algorithm. The pre-training tasks we studied are Inverse Cloze Task (ICT), Body First Selection (BFS), Wiki Link Prediction (WLP) and the combination of them.
Tasks	Information Retrieval, Link Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=rkg-mA4FDr
PDF	https://openreview.net/pdf?id=rkg-mA4FDr
PWC	https://paperswithcode.com/paper/going-beyond-token-level-pre-training-for
Repo
Framework

The intriguing role of module criticality in the generalization of deep networks


Title	The intriguing role of module criticality in the generalization of deep networks
Authors	Anonymous
Abstract	We study the phenomenon that some modules of deep neural networks (DNNs) are more \emph{critical} than others. Meaning that rewinding their parameter values back to initialization, while keeping other modules fixed at the trained parameters, results in a large drop in the network’s performance. Our analysis reveals interesting properties of the loss landscape which leads us to propose a complexity measure, called {\em module criticality}, based on the shape of the valleys that connects the initial and final values of the module parameters. We formulate how generalization relates to the module criticality, and show that this measure is able to explain the superior generalization performance of some architectures over others, whereas earlier measures fail to do so.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1e4jkSKvB
PDF	https://openreview.net/pdf?id=S1e4jkSKvB
PWC	https://paperswithcode.com/paper/the-intriguing-role-of-module-criticality-in
Repo
Framework

Learning Human Postural Control with Hierarchical Acquisition Functions


Title	Learning Human Postural Control with Hierarchical Acquisition Functions
Authors	Nils Rottmann, Tjasa Kunavar, Jan Babic, Jan Peters, Elmar Rueckert
Abstract	Learning control policies in robotic tasks requires a large number of interactions due to small learning rates, bounds on the updates or unknown constraints. In contrast humans can infer protective and safe solutions after a single failure or unexpected observation. In order to reach similar performance, we developed a hierarchical Bayesian optimization algorithm that replicates the cognitive inference and memorization process for avoiding failures in motor control tasks. A Gaussian Process implements the modeling and the sampling of the acquisition function. This enables rapid learning with large learning rates while a mental replay phase ensures that policy regions that led to failures are inhibited during the sampling process. The features of the hierarchical Bayesian optimization method are evaluated in a simulated and physiological humanoid postural balancing task. We quantitatively compare the human learning performance to our learning approach by evaluating the deviations of the center of mass during training. Our results show that we can reproduce the efficient learning of human subjects in postural control tasks which provides a testable model for future physiological motor control tasks. In these postural control tasks, our method outperforms standard Bayesian Optimization in the number of interactions to solve the task, in the computational demands and in the frequency of observed failures.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1eYchEtwH
PDF	https://openreview.net/pdf?id=S1eYchEtwH
PWC	https://paperswithcode.com/paper/learning-human-postural-control-with
Repo
Framework

Faster Neural Network Training with Data Echoing


Title	Faster Neural Network Training with Data Echoing
Authors	Anonymous
Abstract	In the twilight of Moore’s law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training. However, earlier stages of the training pipeline, such as disk I/O and data preprocessing, do not run on accelerators. As accelerators continue to improve, these earlier stages will increasingly become the bottleneck. In this paper, we introduce “data echoing,” which reduces the total computation used by earlier pipeline stages and speeds up training whenever computation upstream from accelerators dominates the training time. Data echoing reuses (or “echoes”) intermediate outputs from earlier pipeline stages in order to reclaim idle capacity. We investigate the behavior of different data echoing algorithms on various workloads, for various amounts of echoing, and for various batch sizes. We find that in all settings, at least one data echoing algorithm can match the baseline’s predictive performance using less upstream computation. We measured a factor of 3.25 decrease in wall-clock time for ResNet-50 on ImageNet when reading training data over a network.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJeO3aVKPB
PDF	https://openreview.net/pdf?id=rJeO3aVKPB
PWC	https://paperswithcode.com/paper/faster-neural-network-training-with-data-1
Repo
Framework

Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation


Title	Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation
Authors	Anonymous
Abstract	Neural conditional text generation systems have achieved significant progress in recent years, showing the ability to produce highly fluent text. However, the inherent lack of controllability in these systems allows them to hallucinate factually incorrect phrases that are unfaithful to the source, making them often unsuitable for many real world systems that require high degrees of precision. In this work, we propose a novel confidence oriented decoder that assigns a confidence score to each target position. This score is learned in training using a variational Bayes objective, and can be leveraged at inference time using a calibration technique to promote more faithful generation. Experiments on a structured data-to-text dataset – WikiBio – show that our approach is more faithful to the source than existing state-of-the-art approaches, according to both automatic metrics and human evaluation.
Tasks	Calibration, Data-to-Text Generation, Text Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=HkxU2pNYPH
PDF	https://openreview.net/pdf?id=HkxU2pNYPH
PWC	https://paperswithcode.com/paper/sticking-to-the-facts-confident-decoding-for-1
Repo
Framework

A Theoretical Analysis of Deep Q-Learning


Title	A Theoretical Analysis of Deep Q-Learning
Authors	Anonymous
Abstract	Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood. In this work, we make the first attempt to theoretically understand the deep Q-network (DQN) algorithm (Mnih et al., 2015) from both algorithmic and statistical perspectives. In specific, we focus on a slight simplification of DQN that fully captures its key features. Under mild assumptions, we establish the algorithmic and statistical rates of convergence for the action-value functions of the iterative policy sequence obtained by DQN. In particular, the statistical error characterizes the bias and variance that arise from approximating the action-value function using deep neural network, while the algorithmic error converges to zero at a geometric rate. As a byproduct, our analysis provides justifications for the techniques of experience replay and target network, which are crucial to the empirical success of DQN. Furthermore, as a simple extension of DQN, we propose the Minimax-DQN algorithm for zero-sum Markov game with two players, which is deferred to the appendix due to space limitations.
Tasks	Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlM0JSFDr
PDF	https://openreview.net/pdf?id=SJlM0JSFDr
PWC	https://paperswithcode.com/paper/a-theoretical-analysis-of-deep-q-learning-1
Repo
Framework

Dynamically Balanced Value Estimates for Actor-Critic Methods


Title	Dynamically Balanced Value Estimates for Actor-Critic Methods
Authors	Anonymous
Abstract	Reinforcement learning in an actor-critic setting relies on accurate value estimates of the critic. However, the combination of function approximation, temporal difference (TD) learning and off-policy training can lead to an overestimating value function. A solution is to use Clipped Double Q-learning (CDQ), which is used in the TD3 algorithm and computes the minimum of two critics in the TD-target. We show that CDQ induces an underestimation bias and propose a new algorithm that accounts for this by using a weighted average of the target from CDQ and the target coming from a single critic. The weighting parameter is adjusted during training such that the value estimates match the actual discounted return on the most recent episodes and by that it balances over- and underestimation. Empirically, we obtain more accurate value estimates and demonstrate state of the art results on several OpenAI gym tasks.
Tasks	Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xyayrtDS
PDF	https://openreview.net/pdf?id=r1xyayrtDS
PWC	https://paperswithcode.com/paper/dynamically-balanced-value-estimates-for
Repo
Framework

Information Theoretic Model Predictive Q-Learning


Title	Information Theoretic Model Predictive Q-Learning
Authors	Anonymous
Abstract	Model-free Reinforcement Learning (RL) algorithms work well in sequential decision-making problems when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both of these assumptions can be violated in real world problems such as robotics, where querying the system can be prohibitively expensive and real-world dynamics can be difficult to model accurately. Although sim-to-real approaches such as domain randomization attempt to mitigate the effects of biased simulation, they can still suffer from optimization challenges such as local minima and hand-designed distributions for randomization, making it difficult to learn an accurate global value function or policy that directly transfers to the real world. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real-robots in a systematic manner.
Tasks	Decision Making, Q-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=rkliHyrFDB
PDF	https://openreview.net/pdf?id=rkliHyrFDB
PWC	https://paperswithcode.com/paper/information-theoretic-model-predictive-q
Repo
Framework