April 1, 2020

3082 words 15 mins read

Paper Group NANR 49

Paper Group NANR 49

Diving into Optimization of Topology in Neural Networks. Stabilizing Transformers for Reinforcement Learning. Aggregating explanation methods for neural networks stabilizes explanations. Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems. Attributed Graph Learning with 2-D Graph Convolution. Chordal-GC …

Diving into Optimization of Topology in Neural Networks

Title Diving into Optimization of Topology in Neural Networks
Authors Anonymous
Abstract Seeking effective networks has become one of the most crucial and practical areas in deep learning. The architecture of a neural network can be represented as a directed acyclic graph, whose nodes denote transformation of layers and edges represent information flow. Despite the selection of \textit{micro} node operations, \textit{macro} connections among the whole network, noted as \textit{topology}, largely affects the optimization process. We first rethink the residual connections via a new \textit{topological view} and observe the benefits provided by dense connections to the optimization. Motivated by which, we propose an innovation method to optimize the topology of a neural network. The optimization space is defined as a complete graph, through assigning learnable weights which reflect the importance of connections, the optimization of topology is transformed into learning a set of continuous variables of edges. To extend the optimization to larger search spaces, a new series of networks, named as TopoNet, are designed. To further focus on critical edges and promote generalization ablity in dense topologies, auxiliary sparsity constraint is adopted to constrain the distribution of edges. Experiments on classical networks prove the effectiveness of the optimization of topology. Experiments with TopoNets further verify both availability and transferability of the proposed method in different tasks e.g. image classification, object detection and face recognition.
Tasks Face Recognition, Image Classification, Object Detection
Published 2020-01-01
URL https://openreview.net/forum?id=HyetFnEFDS
PDF https://openreview.net/pdf?id=HyetFnEFDS
PWC https://paperswithcode.com/paper/diving-into-optimization-of-topology-in
Repo
Framework

Stabilizing Transformers for Reinforcement Learning

Title Stabilizing Transformers for Reinforcement Learning
Authors Anonymous
Abstract Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer’s ability to process long time horizons of information could provide a similar performance boost in partially-observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve the stability and learning speed of the original Transformer and XL variant. The proposed architecture, the Gated Transformer-XL (GTrXL), surpasses LSTMs on challenging memory environments and achieves state-of-the-art results on the multi-task DMLab-30 benchmark suite, exceeding the performance of an external memory architecture. We show that the GTrXL, trained using the same losses, has stability and performance that consistently matches or exceeds a competitive LSTM baseline, including on more reactive tasks where memory is less critical. GTrXL offers an easy-to-train, simple-to-implement but substantially more expressive architectural alternative to the standard multi-layer LSTM ubiquitously used for RL agents in partially-observable environments.
Tasks Language Modelling, Machine Translation
Published 2020-01-01
URL https://openreview.net/forum?id=SyxKrySYPr
PDF https://openreview.net/pdf?id=SyxKrySYPr
PWC https://paperswithcode.com/paper/stabilizing-transformers-for-reinforcement
Repo
Framework

Aggregating explanation methods for neural networks stabilizes explanations

Title Aggregating explanation methods for neural networks stabilizes explanations
Authors Anonymous
Abstract Despite a growing literature on explaining neural networks, no consensus has been reached on how to explain a neural network decision or how to evaluate an explanation. Our contributions in this paper are twofold. First, we investigate schemes to combine explanation methods and reduce model uncertainty to obtain a single aggregated explanation. The aggregation is more robust and aligns better with the neural network than any single explanation method.. Second, we propose a new approach to evaluating explanation methods that circumvents the need for manual evaluation and is not reliant on the alignment of neural networks and humans decision processes.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=B1xeZJHKPB
PDF https://openreview.net/pdf?id=B1xeZJHKPB
PWC https://paperswithcode.com/paper/aggregating-explanation-methods-for-neural
Repo
Framework

Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems

Title Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems
Authors Anonymous
Abstract Recently, many stochastic gradient descent algorithms with variance reduction have been proposed. Moreover, their proximal variants such as Prox-SVRG can effectively solve non-smooth problems, which makes that they are widely applied in many machine learning problems. However, the introduction of proximal operator will result in the error of the optimal value. In order to address this issue, we introduce the idea of extragradient and propose a novel accelerated variance reduced stochastic extragradient descent (AVR-SExtraGD) algorithm, which inherits the advantages of Prox-SVRG and momentum acceleration techniques. Moreover, our theoretical analysis shows that AVR-SExtraGD enjoys the best-known convergence rates and oracle complexities of stochastic first-order algorithms such as Katyusha for both strongly convex and non-strongly convex problems. Finally, our experimental results show that for ERM problems and robust face recognition via sparse representation, our AVR-SExtraGD can yield the improved performance compared with Prox-SVRG and Katyusha. The asynchronous variant of AVR-SExtraGD outperforms KroMagnon and ASAGA, which are the asynchronous variants of SVRG and SAGA, respectively.
Tasks Face Recognition, Robust Face Recognition
Published 2020-01-01
URL https://openreview.net/forum?id=BklDO1HYPS
PDF https://openreview.net/pdf?id=BklDO1HYPS
PWC https://paperswithcode.com/paper/accelerated-variance-reduced-stochastic
Repo
Framework

Attributed Graph Learning with 2-D Graph Convolution

Title Attributed Graph Learning with 2-D Graph Convolution
Authors Anonymous
Abstract Graph convolutional neural networks have demonstrated promising performance in attributed graph learning, thanks to the use of graph convolution that effectively combines graph structures and node features for learning node representations. However, one intrinsic limitation of the commonly adopted 1-D graph convolution is that it only exploits graph connectivity for feature smoothing, which may lead to inferior performance on sparse and noisy real-world attributed networks. To address this problem, we propose to explore relational information among node attributes to complement node relations for representation learning. In particular, we propose to use 2-D graph convolution to jointly model the two kinds of relations and develop a computationally efficient dimensionwise separable 2-D graph convolution (DSGC). Theoretically, we show that DSGC can reduce intra-class variance of node features on both the node dimension and the attribute dimension to facilitate learning. Empirically, we demonstrate that by incorporating attribute relations, DSGC achieves significant performance gain over state-of-the-art methods on node classification and clustering on several real-world attributed networks.
Tasks Node Classification, Representation Learning
Published 2020-01-01
URL https://openreview.net/forum?id=B1gNKxrYPB
PDF https://openreview.net/pdf?id=B1gNKxrYPB
PWC https://paperswithcode.com/paper/attributed-graph-learning-with-2-d-graph
Repo
Framework

Chordal-GCN: Exploiting sparsity in training large-scale graph convolutional networks

Title Chordal-GCN: Exploiting sparsity in training large-scale graph convolutional networks
Authors Anonymous
Abstract Despite the impressive success of graph convolutional networks (GCNs) on numerous applications, training on large-scale sparse networks remains challenging. Current algorithms require large memory space for storing GCN outputs as well as all the intermediate embeddings. Besides, most of these algorithms involves either random sampling or an approximation of the adjacency matrix, which might unfortunately lose important structure information. In this paper, we propose Chordal-GCN for semi-supervised node classification. The proposed model utilizes the exact graph structure (i.e., without sampling or approximation), while requires limited memory resources compared with the original GCN. Moreover, it leverages the sparsity pattern as well as the clustering structure of the graph. The proposed model first decomposes a large-scale sparse network into several small dense subgraphs (called cliques), and constructs a clique tree. By traversing the tree, GCN training is performed clique by clique, and connections between cliques are exploited via the tree hierarchy. Furthermore, we implement Chordal-GCN on large-scale datasets and demonstrate superior performance.
Tasks Node Classification
Published 2020-01-01
URL https://openreview.net/forum?id=rJl05AVtwB
PDF https://openreview.net/pdf?id=rJl05AVtwB
PWC https://paperswithcode.com/paper/chordal-gcn-exploiting-sparsity-in-training
Repo
Framework

Quaternion Equivariant Capsule Networks for 3D Point Clouds

Title Quaternion Equivariant Capsule Networks for 3D Point Clouds
Authors Anonymous
Abstract We present a 3D capsule architecture for processing of point clouds that is equivariant with respect to the SO(3) rotation group, translation and permutation of the unordered input sets. The network operates on a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end equivariance through a novel 3D quaternion group capsule layer, including an equivariant dynamic routing procedure. The capsule layer enables us to disentangle geometry from pose, paving the way for more informative descriptions and a structured latent space. In the process, we theoretically connect the process of dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties, enabling robust pose estimation between capsule layers. Due to the sparse equivariant quaternion capsules, our architecture allows joint object classification and orientation estimation, which we validate empirically on common benchmark datasets.
Tasks Object Classification, Pose Estimation
Published 2020-01-01
URL https://openreview.net/forum?id=B1xtd1HtPS
PDF https://openreview.net/pdf?id=B1xtd1HtPS
PWC https://paperswithcode.com/paper/quaternion-equivariant-capsule-networks-for
Repo
Framework

Lift-the-flap: what, where and when for context reasoning

Title Lift-the-flap: what, where and when for context reasoning
Authors Anonymous
Abstract Context reasoning is critical in a wide variety of applications where current inputs need to be interpreted in the light of previous experience and knowledge. Both spatial and temporal contextual information play a critical role in the domain of visual recognition. Here we investigate spatial constraints (what image features provide contextual information and where they are located), and temporal constraints (when different contextual cues matter) for visual recognition. The task is to reason about the scene context and infer what a target object hidden behind a flap is in a natural image. To tackle this problem, we first describe an online human psychophysics experiment recording active sampling via mouse clicks in lift-the-flap games and identify clicking patterns and features which are diagnostic for high contextual reasoning accuracy. As a proof of the usefulness of these clicking patterns and visual features, we extend a state-of-the-art recurrent model capable of attending to salient context regions, dynamically integrating useful information, making inferences, and predicting class label for the target object over multiple clicks. The proposed model achieves human-level contextual reasoning accuracy, shares human-like sampling behavior and learns interpretable features for contextual reasoning.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=ryeFzT4YPr
PDF https://openreview.net/pdf?id=ryeFzT4YPr
PWC https://paperswithcode.com/paper/lift-the-flap-what-where-and-when-for-context
Repo
Framework

Meta-Learning by Hallucinating Useful Examples

Title Meta-Learning by Hallucinating Useful Examples
Authors Anonymous
Abstract Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples. The hallucination process, however, is still far from generating effective samples for learning. In this work, we investigate two important requirements for the hallucinator — (i) precision: the generated examples should lead to good classifier performance, and (ii) collaboration: both the hallucinator and the classification component need to be trained jointly. By integrating these requirements as novel loss functions into a general meta-learning with hallucination framework, our model-agnostic PrecisE Collaborative hAlluciNator (PECAN) facilitates data hallucination to improve the performance of new classification tasks. Extensive experiments demonstrate state-of-the-art performance on competitive miniImageNet and ImageNet based few-shot benchmarks in various scenarios.
Tasks Few-Shot Learning, Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=rJx8I1rFwr
PDF https://openreview.net/pdf?id=rJx8I1rFwr
PWC https://paperswithcode.com/paper/meta-learning-by-hallucinating-useful
Repo
Framework

Deep probabilistic subsampling for task-adaptive compressed sensing

Title Deep probabilistic subsampling for task-adaptive compressed sensing
Authors Anonymous
Abstract The field of deep learning is commonly concerned with optimizing predictive models using large pre-acquired datasets of densely sampled datapoints or signals. In this work, we demonstrate that the deep learning paradigm can be extended to incorporate a subsampling scheme that is jointly optimized under a desired minimum sample rate. We present Deep Probabilistic Subsampling (DPS), a widely applicable framework for task-adaptive compressed sensing that enables end-to end optimization of an optimal subset of signal samples with a subsequent model that performs a required task. We demonstrate strong performance on reconstruction and classification tasks of a toy dataset, MNIST, and CIFAR10 under stringent subsampling rates in both the pixel and the spatial frequency domain. Due to the task-agnostic nature of the framework, DPS is directly applicable to all real-world domains that benefit from sample rate reduction.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJeq9JBFvH
PDF https://openreview.net/pdf?id=SJeq9JBFvH
PWC https://paperswithcode.com/paper/deep-probabilistic-subsampling-for-task
Repo
Framework

Recurrent Layer Attention Network

Title Recurrent Layer Attention Network
Authors Anonymous
Abstract Capturing long-range feature relations has been a central issue on convolutional neural networks(CNNs). To tackle this, attempts to integrate end-to-end trainable attention module on CNNs are widespread. Main goal of these works is to adjust feature maps considering spatial-channel correlation inside a convolution layer. In this paper, we focus on modeling relationships among layers and propose a novel structure, ‘Recurrent Layer Attention network,’ which stores the hierarchy of features into recurrent neural networks(RNNs) that concurrently propagating with CNN and adaptively scales feature volumes of all layers. We further introduce several structural derivatives for demonstrating the compatibility on recent attention modules and the expandability of proposed network. For semantic understanding on learned features, we also visualize intermediate layers and plot the curve of layer scaling coefficients(i.e., layer attention). Recurrent Layer Attention network achieves significant performance enhancement requiring a slight increase on parameters in an image classification task with CIFAR and ImageNet-1K 2012 dataset and an object detection task with Microsoft COCO 2014 dataset.
Tasks Image Classification, Object Detection
Published 2020-01-01
URL https://openreview.net/forum?id=Bke5aJBKvH
PDF https://openreview.net/pdf?id=Bke5aJBKvH
PWC https://paperswithcode.com/paper/recurrent-layer-attention-network
Repo
Framework

On Empirical Comparisons of Optimizers for Deep Learning

Title On Empirical Comparisons of Optimizers for Deep Learning
Authors Anonymous
Abstract Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this paper we demonstrate the sensitivity of optimizer comparisons to the metaparameter tuning protocol. Our findings suggest that the metaparameter search space may be the single most important factor explaining the rankings obtained by recent empirical comparisons in the literature. In fact, we show that these results can be contradicted when metaparameter search spaces are changed. As tuning effort grows without bound, more general update rules should never underperform the ones they can approximate (i.e., Adam should never perform worse than momentum), but the recent attempts to compare optimizers either assume these inclusion relationships are not relevant in practice or restrict the metaparameters they tune to break the inclusions. In our experiments, we find that the inclusion relationships between optimizers matter in practice and always predict optimizer comparisons. In particular, we find that the popular adative gradient methods never underperform momentum or gradient descent. We also report practical tips around tuning rarely-tuned metaparameters of adaptive gradient methods and raise concerns about fairly benchmarking optimizers for neural network training.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygrAR4tPS
PDF https://openreview.net/pdf?id=HygrAR4tPS
PWC https://paperswithcode.com/paper/on-empirical-comparisons-of-optimizers-for-1
Repo
Framework

EgoMap: Projective mapping and structured egocentric memory for Deep RL

Title EgoMap: Projective mapping and structured egocentric memory for Deep RL
Authors Anonymous
Abstract Tasks involving localization, memorization and planning in partially observable 3D environments are an ongoing challenge in Deep Reinforcement Learning. We present EgoMap, a spatially structured neural memory architecture. EgoMap augments a deep reinforcement learning agent’s performance in 3D environments on challenging tasks with multi-step objectives. The EgoMap architecture incorporates several inductive biases including a differentiable inverse projection of CNN feature vectors onto a top-down spatially structured map. The map is updated with ego-motion measurements through a differentiable affine transform. We show this architecture outperforms both standard recurrent agents and state of the art agents with structured memory. We demonstrate that incorporating these inductive biases into an agent’s architecture allows for stable training with reward alone, circumventing the expense of acquiring and labelling expert trajectories. A detailed ablation study demonstrates the impact of key aspects of the architecture and through extensive qualitative analysis, we show how the agent exploits its structured internal memory to achieve higher performance.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=S1gfu3EtDr
PDF https://openreview.net/pdf?id=S1gfu3EtDr
PWC https://paperswithcode.com/paper/egomap-projective-mapping-and-structured
Repo
Framework

NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension

Title NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension
Authors Anonymous
Abstract Real-world question answering systems often retrieve potentially relevant documents to a given question through a keyword search, followed by a machine reading comprehension (MRC) step to find the exact answer from them. In this process, it is essential to properly determine whether an answer to the question exists in a given document. This task often becomes complicated when the question involves multiple different conditions or requirements which are to be met in the answer. For example, in a question “What was the projection of sea level increases in the fourth assessment report?", the answer should properly satisfy several conditions, such as “increases” (but not decreases) and “fourth” (but not third). To address this, we propose a neural question requirement inspection model called NeurQuRI that extracts a list of conditions from the question, each of which should be satisfied by the candidate answer generated by an MRC model. To check whether each condition is met, we propose a novel, attention-based loss function. We evaluate our approach on SQuAD 2.0 dataset by integrating the proposed module with various MRC models, demonstrating the consistent performance improvements across a wide range of state-of-the-art methods.
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2020-01-01
URL https://openreview.net/forum?id=ryxgsCVYPr
PDF https://openreview.net/pdf?id=ryxgsCVYPr
PWC https://paperswithcode.com/paper/neurquri-neural-question-requirement
Repo
Framework

You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings

Title You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings
Authors Anonymous
Abstract Knowledge graph embedding (KGE) models learn algebraic representations of the entities and relations in a knowledge graph. A vast number of KGE techniques for multi-relational link prediction has been proposed in the recent literature, often with state-of-the-art performance. These approaches differ along a number of dimensions, including different model architectures, different training strategies, and different approaches to hyperparameter optimization. In this paper, we take a step back and aim to summarize and quantify empirically the impact of each of these dimensions on model performance. We report on the results of an extensive experimental study with popular model architectures and training strategies across a wide range of hyperparameter settings. We found that when trained appropriately, the relative performance differences between various model architectures often shrinks and sometimes even reverses when compared to prior results. For example, RESCAL, one of the first KGE models, showed strong performance when trained with state-of-the-art techniques; it was competitive to or outperformed more recent architectures. We also found that good (and often superior to prior studies) model configurations can be found by exploring relatively few random samples from a large hyperparameter space. Our results suggest that many of the more advanced architectures and techniques proposed in the literature should be revisited to reassess their individual benefits.
Tasks Graph Embedding, Hyperparameter Optimization, Knowledge Graph Embedding, Knowledge Graph Embeddings, Link Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=BkxSmlBFvr
PDF https://openreview.net/pdf?id=BkxSmlBFvr
PWC https://paperswithcode.com/paper/you-can-teach-an-old-dog-new-tricks-on
Repo
Framework
comments powered by Disqus