Paper Group NANR 49
Diving into Optimization of Topology in Neural Networks. Stabilizing Transformers for Reinforcement Learning. Aggregating explanation methods for neural networks stabilizes explanations. Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems. Attributed Graph Learning with 2-D Graph Convolution. Chordal-GC …
Diving into Optimization of Topology in Neural Networks
Title | Diving into Optimization of Topology in Neural Networks |
Authors | Anonymous |
Abstract | Seeking effective networks has become one of the most crucial and practical areas in deep learning. The architecture of a neural network can be represented as a directed acyclic graph, whose nodes denote transformation of layers and edges represent information flow. Despite the selection of \textit{micro} node operations, \textit{macro} connections among the whole network, noted as \textit{topology}, largely affects the optimization process. We first rethink the residual connections via a new \textit{topological view} and observe the benefits provided by dense connections to the optimization. Motivated by which, we propose an innovation method to optimize the topology of a neural network. The optimization space is defined as a complete graph, through assigning learnable weights which reflect the importance of connections, the optimization of topology is transformed into learning a set of continuous variables of edges. To extend the optimization to larger search spaces, a new series of networks, named as TopoNet, are designed. To further focus on critical edges and promote generalization ablity in dense topologies, auxiliary sparsity constraint is adopted to constrain the distribution of edges. Experiments on classical networks prove the effectiveness of the optimization of topology. Experiments with TopoNets further verify both availability and transferability of the proposed method in different tasks e.g. image classification, object detection and face recognition. |
Tasks | Face Recognition, Image Classification, Object Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyetFnEFDS |
https://openreview.net/pdf?id=HyetFnEFDS | |
PWC | https://paperswithcode.com/paper/diving-into-optimization-of-topology-in |
Repo | |
Framework | |
Stabilizing Transformers for Reinforcement Learning
Title | Stabilizing Transformers for Reinforcement Learning |
Authors | Anonymous |
Abstract | Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer’s ability to process long time horizons of information could provide a similar performance boost in partially-observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting. In this work we demonstrate that the standard transformer architecture is difficult to optimize, which was previously observed in the supervised learning setting but becomes especially pronounced with RL objectives. We propose architectural modifications that substantially improve the stability and learning speed of the original Transformer and XL variant. The proposed architecture, the Gated Transformer-XL (GTrXL), surpasses LSTMs on challenging memory environments and achieves state-of-the-art results on the multi-task DMLab-30 benchmark suite, exceeding the performance of an external memory architecture. We show that the GTrXL, trained using the same losses, has stability and performance that consistently matches or exceeds a competitive LSTM baseline, including on more reactive tasks where memory is less critical. GTrXL offers an easy-to-train, simple-to-implement but substantially more expressive architectural alternative to the standard multi-layer LSTM ubiquitously used for RL agents in partially-observable environments. |
Tasks | Language Modelling, Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxKrySYPr |
https://openreview.net/pdf?id=SyxKrySYPr | |
PWC | https://paperswithcode.com/paper/stabilizing-transformers-for-reinforcement |
Repo | |
Framework | |
Aggregating explanation methods for neural networks stabilizes explanations
Title | Aggregating explanation methods for neural networks stabilizes explanations |
Authors | Anonymous |
Abstract | Despite a growing literature on explaining neural networks, no consensus has been reached on how to explain a neural network decision or how to evaluate an explanation. Our contributions in this paper are twofold. First, we investigate schemes to combine explanation methods and reduce model uncertainty to obtain a single aggregated explanation. The aggregation is more robust and aligns better with the neural network than any single explanation method.. Second, we propose a new approach to evaluating explanation methods that circumvents the need for manual evaluation and is not reliant on the alignment of neural networks and humans decision processes. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xeZJHKPB |
https://openreview.net/pdf?id=B1xeZJHKPB | |
PWC | https://paperswithcode.com/paper/aggregating-explanation-methods-for-neural |
Repo | |
Framework | |
Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems
Title | Accelerated Variance Reduced Stochastic Extragradient Method for Sparse Machine Learning Problems |
Authors | Anonymous |
Abstract | Recently, many stochastic gradient descent algorithms with variance reduction have been proposed. Moreover, their proximal variants such as Prox-SVRG can effectively solve non-smooth problems, which makes that they are widely applied in many machine learning problems. However, the introduction of proximal operator will result in the error of the optimal value. In order to address this issue, we introduce the idea of extragradient and propose a novel accelerated variance reduced stochastic extragradient descent (AVR-SExtraGD) algorithm, which inherits the advantages of Prox-SVRG and momentum acceleration techniques. Moreover, our theoretical analysis shows that AVR-SExtraGD enjoys the best-known convergence rates and oracle complexities of stochastic first-order algorithms such as Katyusha for both strongly convex and non-strongly convex problems. Finally, our experimental results show that for ERM problems and robust face recognition via sparse representation, our AVR-SExtraGD can yield the improved performance compared with Prox-SVRG and Katyusha. The asynchronous variant of AVR-SExtraGD outperforms KroMagnon and ASAGA, which are the asynchronous variants of SVRG and SAGA, respectively. |
Tasks | Face Recognition, Robust Face Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklDO1HYPS |
https://openreview.net/pdf?id=BklDO1HYPS | |
PWC | https://paperswithcode.com/paper/accelerated-variance-reduced-stochastic |
Repo | |
Framework | |
Attributed Graph Learning with 2-D Graph Convolution
Title | Attributed Graph Learning with 2-D Graph Convolution |
Authors | Anonymous |
Abstract | Graph convolutional neural networks have demonstrated promising performance in attributed graph learning, thanks to the use of graph convolution that effectively combines graph structures and node features for learning node representations. However, one intrinsic limitation of the commonly adopted 1-D graph convolution is that it only exploits graph connectivity for feature smoothing, which may lead to inferior performance on sparse and noisy real-world attributed networks. To address this problem, we propose to explore relational information among node attributes to complement node relations for representation learning. In particular, we propose to use 2-D graph convolution to jointly model the two kinds of relations and develop a computationally efficient dimensionwise separable 2-D graph convolution (DSGC). Theoretically, we show that DSGC can reduce intra-class variance of node features on both the node dimension and the attribute dimension to facilitate learning. Empirically, we demonstrate that by incorporating attribute relations, DSGC achieves significant performance gain over state-of-the-art methods on node classification and clustering on several real-world attributed networks. |
Tasks | Node Classification, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1gNKxrYPB |
https://openreview.net/pdf?id=B1gNKxrYPB | |
PWC | https://paperswithcode.com/paper/attributed-graph-learning-with-2-d-graph |
Repo | |
Framework | |
Chordal-GCN: Exploiting sparsity in training large-scale graph convolutional networks
Title | Chordal-GCN: Exploiting sparsity in training large-scale graph convolutional networks |
Authors | Anonymous |
Abstract | Despite the impressive success of graph convolutional networks (GCNs) on numerous applications, training on large-scale sparse networks remains challenging. Current algorithms require large memory space for storing GCN outputs as well as all the intermediate embeddings. Besides, most of these algorithms involves either random sampling or an approximation of the adjacency matrix, which might unfortunately lose important structure information. In this paper, we propose Chordal-GCN for semi-supervised node classification. The proposed model utilizes the exact graph structure (i.e., without sampling or approximation), while requires limited memory resources compared with the original GCN. Moreover, it leverages the sparsity pattern as well as the clustering structure of the graph. The proposed model first decomposes a large-scale sparse network into several small dense subgraphs (called cliques), and constructs a clique tree. By traversing the tree, GCN training is performed clique by clique, and connections between cliques are exploited via the tree hierarchy. Furthermore, we implement Chordal-GCN on large-scale datasets and demonstrate superior performance. |
Tasks | Node Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJl05AVtwB |
https://openreview.net/pdf?id=rJl05AVtwB | |
PWC | https://paperswithcode.com/paper/chordal-gcn-exploiting-sparsity-in-training |
Repo | |
Framework | |
Quaternion Equivariant Capsule Networks for 3D Point Clouds
Title | Quaternion Equivariant Capsule Networks for 3D Point Clouds |
Authors | Anonymous |
Abstract | We present a 3D capsule architecture for processing of point clouds that is equivariant with respect to the SO(3) rotation group, translation and permutation of the unordered input sets. The network operates on a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end equivariance through a novel 3D quaternion group capsule layer, including an equivariant dynamic routing procedure. The capsule layer enables us to disentangle geometry from pose, paving the way for more informative descriptions and a structured latent space. In the process, we theoretically connect the process of dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties, enabling robust pose estimation between capsule layers. Due to the sparse equivariant quaternion capsules, our architecture allows joint object classification and orientation estimation, which we validate empirically on common benchmark datasets. |
Tasks | Object Classification, Pose Estimation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xtd1HtPS |
https://openreview.net/pdf?id=B1xtd1HtPS | |
PWC | https://paperswithcode.com/paper/quaternion-equivariant-capsule-networks-for |
Repo | |
Framework | |
Lift-the-flap: what, where and when for context reasoning
Title | Lift-the-flap: what, where and when for context reasoning |
Authors | Anonymous |
Abstract | Context reasoning is critical in a wide variety of applications where current inputs need to be interpreted in the light of previous experience and knowledge. Both spatial and temporal contextual information play a critical role in the domain of visual recognition. Here we investigate spatial constraints (what image features provide contextual information and where they are located), and temporal constraints (when different contextual cues matter) for visual recognition. The task is to reason about the scene context and infer what a target object hidden behind a flap is in a natural image. To tackle this problem, we first describe an online human psychophysics experiment recording active sampling via mouse clicks in lift-the-flap games and identify clicking patterns and features which are diagnostic for high contextual reasoning accuracy. As a proof of the usefulness of these clicking patterns and visual features, we extend a state-of-the-art recurrent model capable of attending to salient context regions, dynamically integrating useful information, making inferences, and predicting class label for the target object over multiple clicks. The proposed model achieves human-level contextual reasoning accuracy, shares human-like sampling behavior and learns interpretable features for contextual reasoning. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryeFzT4YPr |
https://openreview.net/pdf?id=ryeFzT4YPr | |
PWC | https://paperswithcode.com/paper/lift-the-flap-what-where-and-when-for-context |
Repo | |
Framework | |
Meta-Learning by Hallucinating Useful Examples
Title | Meta-Learning by Hallucinating Useful Examples |
Authors | Anonymous |
Abstract | Learning to hallucinate additional examples has recently been shown as a promising direction to address few-shot learning tasks, which aim to learn novel concepts from very few examples. The hallucination process, however, is still far from generating effective samples for learning. In this work, we investigate two important requirements for the hallucinator — (i) precision: the generated examples should lead to good classifier performance, and (ii) collaboration: both the hallucinator and the classification component need to be trained jointly. By integrating these requirements as novel loss functions into a general meta-learning with hallucination framework, our model-agnostic PrecisE Collaborative hAlluciNator (PECAN) facilitates data hallucination to improve the performance of new classification tasks. Extensive experiments demonstrate state-of-the-art performance on competitive miniImageNet and ImageNet based few-shot benchmarks in various scenarios. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJx8I1rFwr |
https://openreview.net/pdf?id=rJx8I1rFwr | |
PWC | https://paperswithcode.com/paper/meta-learning-by-hallucinating-useful |
Repo | |
Framework | |
Deep probabilistic subsampling for task-adaptive compressed sensing
Title | Deep probabilistic subsampling for task-adaptive compressed sensing |
Authors | Anonymous |
Abstract | The field of deep learning is commonly concerned with optimizing predictive models using large pre-acquired datasets of densely sampled datapoints or signals. In this work, we demonstrate that the deep learning paradigm can be extended to incorporate a subsampling scheme that is jointly optimized under a desired minimum sample rate. We present Deep Probabilistic Subsampling (DPS), a widely applicable framework for task-adaptive compressed sensing that enables end-to end optimization of an optimal subset of signal samples with a subsequent model that performs a required task. We demonstrate strong performance on reconstruction and classification tasks of a toy dataset, MNIST, and CIFAR10 under stringent subsampling rates in both the pixel and the spatial frequency domain. Due to the task-agnostic nature of the framework, DPS is directly applicable to all real-world domains that benefit from sample rate reduction. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJeq9JBFvH |
https://openreview.net/pdf?id=SJeq9JBFvH | |
PWC | https://paperswithcode.com/paper/deep-probabilistic-subsampling-for-task |
Repo | |
Framework | |
Recurrent Layer Attention Network
Title | Recurrent Layer Attention Network |
Authors | Anonymous |
Abstract | Capturing long-range feature relations has been a central issue on convolutional neural networks(CNNs). To tackle this, attempts to integrate end-to-end trainable attention module on CNNs are widespread. Main goal of these works is to adjust feature maps considering spatial-channel correlation inside a convolution layer. In this paper, we focus on modeling relationships among layers and propose a novel structure, ‘Recurrent Layer Attention network,’ which stores the hierarchy of features into recurrent neural networks(RNNs) that concurrently propagating with CNN and adaptively scales feature volumes of all layers. We further introduce several structural derivatives for demonstrating the compatibility on recent attention modules and the expandability of proposed network. For semantic understanding on learned features, we also visualize intermediate layers and plot the curve of layer scaling coefficients(i.e., layer attention). Recurrent Layer Attention network achieves significant performance enhancement requiring a slight increase on parameters in an image classification task with CIFAR and ImageNet-1K 2012 dataset and an object detection task with Microsoft COCO 2014 dataset. |
Tasks | Image Classification, Object Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bke5aJBKvH |
https://openreview.net/pdf?id=Bke5aJBKvH | |
PWC | https://paperswithcode.com/paper/recurrent-layer-attention-network |
Repo | |
Framework | |
On Empirical Comparisons of Optimizers for Deep Learning
Title | On Empirical Comparisons of Optimizers for Deep Learning |
Authors | Anonymous |
Abstract | Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this paper we demonstrate the sensitivity of optimizer comparisons to the metaparameter tuning protocol. Our findings suggest that the metaparameter search space may be the single most important factor explaining the rankings obtained by recent empirical comparisons in the literature. In fact, we show that these results can be contradicted when metaparameter search spaces are changed. As tuning effort grows without bound, more general update rules should never underperform the ones they can approximate (i.e., Adam should never perform worse than momentum), but the recent attempts to compare optimizers either assume these inclusion relationships are not relevant in practice or restrict the metaparameters they tune to break the inclusions. In our experiments, we find that the inclusion relationships between optimizers matter in practice and always predict optimizer comparisons. In particular, we find that the popular adative gradient methods never underperform momentum or gradient descent. We also report practical tips around tuning rarely-tuned metaparameters of adaptive gradient methods and raise concerns about fairly benchmarking optimizers for neural network training. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygrAR4tPS |
https://openreview.net/pdf?id=HygrAR4tPS | |
PWC | https://paperswithcode.com/paper/on-empirical-comparisons-of-optimizers-for-1 |
Repo | |
Framework | |
EgoMap: Projective mapping and structured egocentric memory for Deep RL
Title | EgoMap: Projective mapping and structured egocentric memory for Deep RL |
Authors | Anonymous |
Abstract | Tasks involving localization, memorization and planning in partially observable 3D environments are an ongoing challenge in Deep Reinforcement Learning. We present EgoMap, a spatially structured neural memory architecture. EgoMap augments a deep reinforcement learning agent’s performance in 3D environments on challenging tasks with multi-step objectives. The EgoMap architecture incorporates several inductive biases including a differentiable inverse projection of CNN feature vectors onto a top-down spatially structured map. The map is updated with ego-motion measurements through a differentiable affine transform. We show this architecture outperforms both standard recurrent agents and state of the art agents with structured memory. We demonstrate that incorporating these inductive biases into an agent’s architecture allows for stable training with reward alone, circumventing the expense of acquiring and labelling expert trajectories. A detailed ablation study demonstrates the impact of key aspects of the architecture and through extensive qualitative analysis, we show how the agent exploits its structured internal memory to achieve higher performance. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gfu3EtDr |
https://openreview.net/pdf?id=S1gfu3EtDr | |
PWC | https://paperswithcode.com/paper/egomap-projective-mapping-and-structured |
Repo | |
Framework | |
NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension
Title | NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension |
Authors | Anonymous |
Abstract | Real-world question answering systems often retrieve potentially relevant documents to a given question through a keyword search, followed by a machine reading comprehension (MRC) step to find the exact answer from them. In this process, it is essential to properly determine whether an answer to the question exists in a given document. This task often becomes complicated when the question involves multiple different conditions or requirements which are to be met in the answer. For example, in a question “What was the projection of sea level increases in the fourth assessment report?", the answer should properly satisfy several conditions, such as “increases” (but not decreases) and “fourth” (but not third). To address this, we propose a neural question requirement inspection model called NeurQuRI that extracts a list of conditions from the question, each of which should be satisfied by the candidate answer generated by an MRC model. To check whether each condition is met, we propose a novel, attention-based loss function. We evaluate our approach on SQuAD 2.0 dataset by integrating the proposed module with various MRC models, demonstrating the consistent performance improvements across a wide range of state-of-the-art methods. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxgsCVYPr |
https://openreview.net/pdf?id=ryxgsCVYPr | |
PWC | https://paperswithcode.com/paper/neurquri-neural-question-requirement |
Repo | |
Framework | |
You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings
Title | You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings |
Authors | Anonymous |
Abstract | Knowledge graph embedding (KGE) models learn algebraic representations of the entities and relations in a knowledge graph. A vast number of KGE techniques for multi-relational link prediction has been proposed in the recent literature, often with state-of-the-art performance. These approaches differ along a number of dimensions, including different model architectures, different training strategies, and different approaches to hyperparameter optimization. In this paper, we take a step back and aim to summarize and quantify empirically the impact of each of these dimensions on model performance. We report on the results of an extensive experimental study with popular model architectures and training strategies across a wide range of hyperparameter settings. We found that when trained appropriately, the relative performance differences between various model architectures often shrinks and sometimes even reverses when compared to prior results. For example, RESCAL, one of the first KGE models, showed strong performance when trained with state-of-the-art techniques; it was competitive to or outperformed more recent architectures. We also found that good (and often superior to prior studies) model configurations can be found by exploring relatively few random samples from a large hyperparameter space. Our results suggest that many of the more advanced architectures and techniques proposed in the literature should be revisited to reassess their individual benefits. |
Tasks | Graph Embedding, Hyperparameter Optimization, Knowledge Graph Embedding, Knowledge Graph Embeddings, Link Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxSmlBFvr |
https://openreview.net/pdf?id=BkxSmlBFvr | |
PWC | https://paperswithcode.com/paper/you-can-teach-an-old-dog-new-tricks-on |
Repo | |
Framework | |