Paper Group NANR 121
Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses. Combining graph and sequence information to learn protein representations. A Simple and Scalable Shape Representation for 3D Reconstruction. Self-labelling via simultaneous clustering and representation learning. Multiplicative Interactions and Where to Find Them. Ind …
Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses
Title | Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses |
Authors | Anonymous |
Abstract | Learning long-term dependencies is a key long-standing challenge of recurrent neural networks (RNNs). Hierarchical recurrent neural networks (HRNNs) have been considered a promising approach as long-term dependencies are resolved through shortcuts up and down the hierarchy. Yet, the memory requirements of Truncated Backpropagation Through Time (TBPTT) still prevent training them on very long sequences. In this paper, we empirically show that in (deep) HRNNs, propagating gradients back from higher to lower levels can be replaced by locally computable losses, without harming the learning capability of the network, over a wide range of tasks. This decoupling by local losses reduces the memory requirements of training by a factor exponential in the depth of the hierarchy in comparison to standard TBPTT. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1lNWertDr |
https://openreview.net/pdf?id=S1lNWertDr | |
PWC | https://paperswithcode.com/paper/decoupling-hierarchical-recurrent-neural |
Repo | |
Framework | |
Combining graph and sequence information to learn protein representations
Title | Combining graph and sequence information to learn protein representations |
Authors | Anonymous |
Abstract | Computational methods that infer the function of proteins are key to understanding life at the molecular level. In recent years, representation learning has emerged as a powerful paradigm to discover new patterns among entities as varied as images, words, speech, molecules. In typical representation learning, there is only one source of data or one level of abstraction at which the learned representation occurs. However, proteins can be described by their primary, secondary, tertiary, and quaternary structure or even as nodes in protein-protein interaction networks. Given that protein function is an emergent property of all these levels of interactions in this work, we learn joint representations from both amino acid sequence and multilayer networks representing tissue-specific protein-protein interactions. Using these representations, we train machine learning models that outperform existing methods on the task of tissue-specific protein function prediction on 10 out of 13 tissues. Furthermore, we outperform existing methods by 19% on average. |
Tasks | Protein Function Prediction, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skx73lBFDS |
https://openreview.net/pdf?id=Skx73lBFDS | |
PWC | https://paperswithcode.com/paper/combining-graph-and-sequence-information-to |
Repo | |
Framework | |
A Simple and Scalable Shape Representation for 3D Reconstruction
Title | A Simple and Scalable Shape Representation for 3D Reconstruction |
Authors | Anonymous |
Abstract | Deep learning applied to the reconstruction of 3D shapes has seen growing interest. A popular approach to 3D reconstruction and generation in recent years has been the CNN decoder-encoder model often applied in voxel space. However this often scales very poorly with the resolution limiting the effectiveness of these models. Several sophisticated alternatives for decoding to 3D shapes have been proposed typically relying on alternative deep learning architectures. We show however in this work that standard benchmarks in 3D reconstruction can be tackled with a surprisingly simple approach: a linear decoder obtained by principal component analysis on the signed distance transform of the surface. This approach allows easily scaling to larger resolutions. We show in multiple experiments it is competitive with state of the art methods and also allows the decoder to be fine-tuned on the target task using a loss designed for SDF transforms, obtaining further gains. |
Tasks | 3D Reconstruction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJgjGxrFPS |
https://openreview.net/pdf?id=rJgjGxrFPS | |
PWC | https://paperswithcode.com/paper/a-simple-and-scalable-shape-representation |
Repo | |
Framework | |
Self-labelling via simultaneous clustering and representation learning
Title | Self-labelling via simultaneous clustering and representation learning |
Authors | Anonymous |
Abstract | Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard cross-entropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Compared to the best previous method in this class, namely DeepCluster, our formulation minimizes a single objective function for both representation learning and clustering; it also significantly outperforms DeepCluster in standard benchmarks. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hyx-jyBFPr |
https://openreview.net/pdf?id=Hyx-jyBFPr | |
PWC | https://paperswithcode.com/paper/self-labelling-via-simultaneous-clustering |
Repo | |
Framework | |
Multiplicative Interactions and Where to Find Them
Title | Multiplicative Interactions and Where to Find Them |
Authors | Anonymous |
Abstract | We explore the role of multiplicative interaction as a unifying framework to describe a range of classical and modern neural network architectural motifs, such as gating, attention layers, hypernetworks, and dynamic convolutions amongst others. Multiplicative interaction layers as primitive operations have a long-established presence in the literature, though this often not emphasized and thus under-appreciated. We begin by showing that such layers strictly enrich the representable function classes of neural networks. We conjecture that multiplicative interactions offer a particularly powerful inductive bias when fusing multiple streams of information or when conditional computation is required. We therefore argue that they should be considered in many situation where multiple compute or information paths need to be combined, in place of the simple and oft-used concatenation operation. Finally, we back up our claims and demonstrate the potential of multiplicative interactions by applying them in large-scale complex RL and sequence modelling tasks, where their use allows us to deliver state-of-the-art results, and thereby provides new evidence in support of multiplicative interactions playing a more prominent role when designing new neural network architectures. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylnK6VtDH |
https://openreview.net/pdf?id=rylnK6VtDH | |
PWC | https://paperswithcode.com/paper/multiplicative-interactions-and-where-to-find |
Repo | |
Framework | |
Inductive and Unsupervised Representation Learning on Graph Structured Objects
Title | Inductive and Unsupervised Representation Learning on Graph Structured Objects |
Authors | Anonymous |
Abstract | Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain. It is also challenging to make graph learning inductive and unsupervised at the same time, as learning processes guided by reconstruction error based loss functions inevitably demand graph similarity evaluation that is usually computationally intractable. In this paper, we propose a general framework SEED (Sampling, Encoding, and Embedding Distributions) for inductive and unsupervised representation learning on graph structured objects. Instead of directly dealing with the computational challenges raised by graph similarity evaluation, given an input graph, the SEED framework samples a number of subgraphs whose reconstruction errors could be efficiently evaluated, encodes the subgraph samples into a collection of subgraph vectors, and employs the embedding of the subgraph vector distribution as the output vector representation for the input graph. By theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism. Using public benchmark datasets, our empirical study suggests the proposed SEED framework is able to achieve up to 10% improvement, compared with competitive baseline methods. |
Tasks | Graph Similarity, Information Retrieval, Representation Learning, Unsupervised Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkem91rtDB |
https://openreview.net/pdf?id=rkem91rtDB | |
PWC | https://paperswithcode.com/paper/inductive-and-unsupervised-representation |
Repo | |
Framework | |
Neural Approximation of an Auto-Regressive Process through Confidence Guided Sampling
Title | Neural Approximation of an Auto-Regressive Process through Confidence Guided Sampling |
Authors | Anonymous |
Abstract | We propose a generic confidence-based approximation that can be plugged in and simplify an auto-regressive generation process with a proved convergence. We first assume that the priors of future samples can be generated in an independently and identically distributed (i.i.d.) manner using an efficient predictor. Given the past samples and future priors, the mother AR model can post-process the priors while the accompanied confidence predictor decides whether the current sample needs a resampling or not. Thanks to the i.i.d. assumption, the post-processing can update each sample in a parallel way, which remarkably accelerates the mother model. Our experiments on different data domains including sequences and images show that the proposed method can successfully capture the complex structures of the data and generate the meaningful future samples with lower computational cost while preserving the sequential relationship of the data.} |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkloDJSFPH |
https://openreview.net/pdf?id=SkloDJSFPH | |
PWC | https://paperswithcode.com/paper/neural-approximation-of-an-auto-regressive |
Repo | |
Framework | |
Ellipsoidal Trust Region Methods for Neural Network Training
Title | Ellipsoidal Trust Region Methods for Neural Network Training |
Authors | Anonymous |
Abstract | We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks. This approach can be seen as a higher-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we show that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of time. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgEd6NYPH |
https://openreview.net/pdf?id=BJgEd6NYPH | |
PWC | https://paperswithcode.com/paper/ellipsoidal-trust-region-methods-for-neural |
Repo | |
Framework | |
Continuous Adaptation in Multi-agent Competitive Environments
Title | Continuous Adaptation in Multi-agent Competitive Environments |
Authors | Anonymous |
Abstract | In a multi-agent competitive environment, we would expect an agent who can quickly adapt to environmental changes may have a higher probability to survive and beat other agents. In this paper, to discuss whether the adaptation capability can help a learning agent to improve its competitiveness in a multi-agent environment, we construct a simplified baseball game scenario to develop and evaluate the adaptation capability of learning agents. Our baseball game scenario is modeled as a two-player zero-sum stochastic game with only the final reward. We purpose a modified Deep CFR algorithm to learn a strategy that approximates the Nash equilibrium strategy. We also form several teams, with different teams adopting different playing strategies, trying to analyze (1) whether an adaptation mechanism can help in increasing the winning percentage and (2) what kind of initial strategies can help a team to get a higher winning percentage. The experimental results show that the learned Nash-equilibrium strategy is very similar to real-life baseball game strategy. Besides, with the proposed strategy adaptation mechanism, the winning percentage can be increased for the team with a Nash-equilibrium initial strategy. Nevertheless, based on the same adaptation mechanism, those teams with deterministic initial strategies actually become less competitive. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkllBpEKDH |
https://openreview.net/pdf?id=BkllBpEKDH | |
PWC | https://paperswithcode.com/paper/continuous-adaptation-in-multi-agent |
Repo | |
Framework | |
Discriminability Distillation in Group Representation Learning
Title | Discriminability Distillation in Group Representation Learning |
Authors | Anonymous |
Abstract | Learning group representation is a commonly concerned issue in tasks where the basic unit is a group, set or sequence. The computer vision community tries to tackle it by aggregating the elements in a group based on an indicator either defined by human such as the quality or saliency of an element, or generated by a black box such as the attention score or output of a RNN. This article provides a more essential and explicable view. We claim the most significant indicator to show whether the group representation can be benefited from an element is not the quality, or an inexplicable score, but the \textit{discrimiability}. Our key insight is to explicitly design the \textit{discrimiability} using embedded class centroids on a proxy set, and show the discrimiability distribution \textit{w.r.t.} the element space can be distilled by a light-weight auxiliary distillation network. This processing is called \textit{discriminability distillation learning} (DDL). We show the proposed DDL can be flexibly plugged into many group based recognition tasks without influencing the training procedure of the original tasks. Comprehensive experiments on set-to-set face recognition and action recognition valid the advantage of DDL on both accuracy and efficiency, and it pushes forward the state-of-the-art results on these tasks by an impressive margin. |
Tasks | Face Recognition, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJgE9CEYPS |
https://openreview.net/pdf?id=rJgE9CEYPS | |
PWC | https://paperswithcode.com/paper/discriminability-distillation-in-group |
Repo | |
Framework | |
Out-of-Distribution Image Detection Using the Normalized Compression Distance
Title | Out-of-Distribution Image Detection Using the Normalized Compression Distance |
Authors | Anonymous |
Abstract | On detection of the out-of-distribution images, whose underlying distribution is different from that of the training dataset, we tackle to apply out-of-distribution detection methods to already deployed convolutional neural networks. Most recent approaches have to utilize out-of-distribution samples for validation or retrain the model, which makes it less practical for real-world applications. We propose a novel out-of-distribution detection method MALCOM, which neither uses any out-of-distribution samples nor retrain the model. Inspired by the method using the global average pooling on the feature maps of the convolutional neural networks, the goal of our method is to extract informative sequential patterns from the feature maps. To this end, we introduce a similarity metric which focuses on the shared patterns between two sequences. In short, MALCOM uses both the global average and spatial pattern of the feature maps to accurately identify out-of-distribution samples. |
Tasks | Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1livgrFvr |
https://openreview.net/pdf?id=H1livgrFvr | |
PWC | https://paperswithcode.com/paper/out-of-distribution-image-detection-using-the |
Repo | |
Framework | |
A Greedy Approach to Max-Sliced Wasserstein GANs
Title | A Greedy Approach to Max-Sliced Wasserstein GANs |
Authors | Anonymous |
Abstract | Generative Adversarial Networks have made data generation possible in various use cases, but in case of complex, high-dimensional distributions it can be difficult to train them, because of convergence problems and the appearance of mode collapse. Sliced Wasserstein GANs and especially the application of the Max-Sliced Wasserstein distance made it possible to approximate Wasserstein distance during training in an efficient and stable way and helped ease convergence problems of these architectures. This method transforms sample assignment and distance calculation into sorting the one-dimensional projection of the samples, which results a sufficient approximation of the high-dimensional Wasserstein distance. In this paper we will demonstrate that the approximation of the Wasserstein distance by sorting the samples is not always the optimal approach and the greedy assignment of the real and fake samples can result faster convergence and better approximation of the original distribution. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJgRsyBtPB |
https://openreview.net/pdf?id=BJgRsyBtPB | |
PWC | https://paperswithcode.com/paper/a-greedy-approach-to-max-sliced-wasserstein |
Repo | |
Framework | |
A Hierarchy of Graph Neural Networks Based on Learnable Local Features
Title | A Hierarchy of Graph Neural Networks Based on Learnable Local Features |
Authors | Anonymous |
Abstract | Graph neural networks (GNNs) are a powerful tool to learn representations on graphs by iteratively aggregating features from node neighbourhoods. Many variant models have been proposed, but there is limited understanding on both how to compare different architectures and how to construct GNNs systematically. Here, we propose a hierarchy of GNNs based on their aggregation regions. We derive theoretical results about the discriminative power and feature representation capabilities of each class. Then, we show how this framework can be utilized to systematically construct arbitrarily powerful GNNs. As an example, we construct a simple architecture that exceeds the expressiveness of the Weisfeiler-Lehman graph isomorphism test. We empirically validate our theory on both synthetic and real-world benchmarks, and demonstrate our example’s theoretical power translates to state-of-the-art results on node classification, graph classification, and graph regression tasks. |
Tasks | Graph Classification, Graph Regression, Node Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryeEr0EFvS |
https://openreview.net/pdf?id=ryeEr0EFvS | |
PWC | https://paperswithcode.com/paper/a-hierarchy-of-graph-neural-networks-based-on |
Repo | |
Framework | |
Testing For Typicality with Respect to an Ensemble of Learned Distributions
Title | Testing For Typicality with Respect to an Ensemble of Learned Distributions |
Authors | Anonymous |
Abstract | Good methods of performing anomaly detection on high-dimensional data sets are needed, since algorithms which are trained on data are only expected to perform well on data that is similar to the training data. There are theoretical results on the ability to detect if a population of data is likely to come from a known base distribution, which is known as the goodness-of-fit problem, but those results require knowing a model of the base distribution. The ability to correctly reject anomalous data hinges on the accuracy of the model of the base distribution. For high dimensional data, learning an accurate-enough model of the base distribution such that anomaly detection works reliably is very challenging, as many researchers have noted in recent years. Existing methods for the goodness-of-fit problem do not ac- count for the fact that a model of the base distribution is learned. To address that gap, we offer a theoretically motivated approach to account for the density learning procedure. In particular, we propose training an ensemble of density models, considering data to be anomalous if the data is anomalous with respect to any member of the ensemble. We provide a theoretical justification for this approach, proving first that a test on typicality is a valid approach to the goodness-of-fit problem, and then proving that for a correctly constructed ensemble of models, the intersection of typical sets of the models lies in the interior of the typical set of the base distribution. We present our method in the context of an example on synthetic data in which the effects we consider can easily be seen. |
Tasks | Anomaly Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJev6JBtvH |
https://openreview.net/pdf?id=SJev6JBtvH | |
PWC | https://paperswithcode.com/paper/testing-for-typicality-with-respect-to-an |
Repo | |
Framework | |
Anomaly Detection and Localization in Images using Guided Attention
Title | Anomaly Detection and Localization in Images using Guided Attention |
Authors | Anonymous |
Abstract | Anomaly detection and localization is a popular computer vision problem which involves detecting anomalous images and localizing anomalies within them. However, this task is challenging due to small sample size and pixel coverage of the anomaly in real-world scenarios. Previous works have a drawback of using anomalous images to compute a threshold during training to detect and localize anomalies. To tackle these issues, we propose AVAGA - the first end-to-end trainable convolutional adversarial variational autoencoder (CAVAE) framework using guided attention which localizes the anomaly with the help of attention maps. AVAGA detects an image as anomalous from the large pixel-wise difference between the input and reconstructed image. In an unsupervised setting, we propose a guided attention loss, where we encourage AVAGA to focus on all non-anomalous regions in the image without using any anomalous images during training. Furthermore, we also propose a selective gradient backpropagation technique for guided attention, which enhances the performance of anomaly localization while using only 2% anomalous images in a weakly supervised setting. AVAGA outperforms the state-of-the-art (SoTA) methods by 10% and 18% on localization and 8% and 15% on classification accuracy in unsupervised and weakly supervised settings respectively on Mvtec Anomaly Detection (MvAD) dataset and by 11% and 22% on localization and 10% and 19% on classification accuracy in unsupervised and weakly supervised settings respectively on the modified ShanghaiTech Campus (STC) dataset |
Tasks | Anomaly Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1gikpEtwH |
https://openreview.net/pdf?id=B1gikpEtwH | |
PWC | https://paperswithcode.com/paper/anomaly-detection-and-localization-in-images |
Repo | |
Framework | |