April 1, 2020

3014 words 15 mins read

Paper Group NANR 33

Pay Attention to Features, Transfer Learn faster CNNs. Identifying Weights and Architectures of Unknown ReLU Networks. GRAPHS, ENTITIES, AND STEP MIXTURE. An Empirical Study on Post-processing Methods for Word Embeddings. GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modelling. On the …

Pay Attention to Features, Transfer Learn faster CNNs


Title	Pay Attention to Features, Transfer Learn faster CNNs
Authors	Anonymous
Abstract	Deep convolutional neural networks are now widely deployed in vision applications, but the size of training data can bottleneck their performance. Transfer learning offers the chance for CNNs to learn with limited data samples by transferring knowledge from weights pretrained on large datasets. On the other hand, blindly transferring all learned features from the source dataset brings unnecessary computation to CNNs on the target task. In this paper, we propose attentive feature distillation and selection (AFDS) that not only adjusts the strength of regularization introduced by transfer learning but also dynamically determines which are the important features to transfer. When deploying AFDS on ResNet-101, we achieve state-of-the-art computation reduction at the same accuracy budget, outperforming all existing transfer learning methods. On a 10x MACs reduction budget, transfer learned from ImageNet to Stanford Dogs 120, AFDS achieves an accuracy that is 12.51% higher than its best competitor.
Tasks	Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxyCeHtPB
PDF	https://openreview.net/pdf?id=ryxyCeHtPB
PWC	https://paperswithcode.com/paper/pay-attention-to-features-transfer-learn
Repo
Framework

Identifying Weights and Architectures of Unknown ReLU Networks


Title	Identifying Weights and Architectures of Unknown ReLU Networks
Authors	Anonymous
Abstract	The output of a neural network depends on its parameters in a highly nonlinear way, and it is widely assumed that a network’s parameters cannot be identified from its outputs. Here, we show that in many cases it is possible to reconstruct the architecture, weights, and biases of a deep ReLU network given the ability to query the network. ReLU networks are piecewise linear and the boundaries between pieces correspond to inputs for which one of the ReLUs switches between inactive and active states. Thus, first-layer ReLUs can be identified (up to sign and scaling) based on the orientation of their associated hyperplanes. Later-layer ReLU boundaries bend when they cross earlier-layer boundaries and the extent of bending reveals the weights between them. Our algorithm uses this to identify the units in the network and weights connecting them (up to isomorphism). The fact that considerable parts of deep networks can be identified from their outputs has implications for security, neuroscience, and our understanding of neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HklFUlBKPB
PDF	https://openreview.net/pdf?id=HklFUlBKPB
PWC	https://paperswithcode.com/paper/identifying-weights-and-architectures-of
Repo
Framework

GRAPHS, ENTITIES, AND STEP MIXTURE


Title	GRAPHS, ENTITIES, AND STEP MIXTURE
Authors	Anonymous
Abstract	Graph neural networks have shown promising results on representing and analyzing diverse graph-structured data such as social, citation, and protein interaction networks. Existing approaches commonly suffer from the oversmoothing issue, regardless of whether policies are edge-based or node-based for neighborhood aggregation. Most methods also focus on transductive scenarios for fixed graphs, leading to poor generalization performance for unseen graphs. To address these issues, we propose a new graph neural network model that considers both edge-based neighborhood relationships and node-based entity features, i.e. Graph Entities with Step Mixture via random walk (GESM). GESM employs a mixture of various steps through random walk to alleviate the oversmoothing problem and attention to use node information explicitly. These two mechanisms allow for a weighted neighborhood aggregation which considers the properties of entities and relations. With intensive experiments, we show that the proposed GESM achieves state-of-the-art or comparable performances on four benchmark graph datasets comprising transductive and inductive learning tasks. Furthermore, we empirically demonstrate the significance of considering global information. The source code will be publicly available in the near future.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1eWbkSFPS
PDF	https://openreview.net/pdf?id=S1eWbkSFPS
PWC	https://paperswithcode.com/paper/graphs-entities-and-step-mixture
Repo
Framework

An Empirical Study on Post-processing Methods for Word Embeddings


Title	An Empirical Study on Post-processing Methods for Word Embeddings
Authors	Anonymous
Abstract	Word embeddings learnt from large corpora have been adopted in various applications in natural language processing and served as the general input representations to learning systems. Recently, a series of post-processing methods have been proposed to boost the performance of word embeddings on similarity comparison and analogy retrieval tasks, and some have been adapted to compose sentence representations. The general hypothesis behind these methods is that by enforcing the embedding space to be more isotropic, the similarity between words can be better expressed. We view these methods as an approach to shrink the covariance/gram matrix, which is estimated by learning word vectors, towards a scaled identity matrix. By optimising an objective in the semi-Riemannian manifold with Centralised Kernel Alignment (CKA), we are able to search for the optimal shrinkage parameter, and provide a post-processing method to smooth the spectrum of learnt word vectors which yields improved performance on downstream tasks.
Tasks	Word Embeddings
Published	2020-01-01
URL	https://openreview.net/forum?id=Byla224KPr
PDF	https://openreview.net/pdf?id=Byla224KPr
PWC	https://paperswithcode.com/paper/an-empirical-study-on-post-processing-methods-1
Repo
Framework


Title	GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modelling
Authors	Anonymous
Abstract	Unsupervised image-to-image translation aims to learn a mapping between several visual domains by using unpaired training pairs. Recent studies have shown remarkable success in image-to-image translation for multiple domains but they suffer from two main limitations: they are either built from several two-domain mappings that are required to be learned independently and/or they generate low-diversity results, a phenomenon known as model collapse. To overcome these limitations, we propose a method named GMM-UNIT based on a content-attribute disentangled representation, where the attribute space is fitted with a GMM. Each GMM component represents a domain, and this simple assumption has two prominent advantages. First, the dimension of the attribute space does not grow linearly with the number of domains, as it is the case in the literature. Second, the continuous domain encoding allows for interpolation between domains and for extrapolation to unseen domains. Additionally, we show how GMM-UNIT can be constrained down to different methods in the literature, meaning that GMM-UNIT is a unifying framework for unsupervised image-to-image translation.
Tasks	Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=HkeFQgrFDr
PDF	https://openreview.net/pdf?id=HkeFQgrFDr
PWC	https://paperswithcode.com/paper/gmm-unit-unsupervised-multi-domain-and-multi
Repo
Framework

On the Linguistic Capacity of Real-time Counter Automata


Title	On the Linguistic Capacity of Real-time Counter Automata
Authors	William Merrill
Abstract	While counter machines have received little attention in theoretical computer science since the 1960s, they have recently achieved a newfound relevance to the field of natural language processing (NLP). Recent work has suggested that some strong-performing recurrent neural networks utilize their memory as counters. Thus, one potential way to understand the sucess of these networks is to revisit the theory of counter computation. Therefore, we choose to study the abilities of real-time counter machines as formal grammars. We first show that several variants of the counter machine converge to express the same class of formal languages. We also prove that counter languages are closed under complement, union, intersection, and many other common set operations. Next, we show that counter machines cannot evaluate boolean expressions, even though they can weakly validate their syntax. This has implications for the interpretability and evaluation of neural network systems: successfully matching syntactic patterns does not guarantee that a counter-like model accurately represents underlying semantic structures. Finally, we consider the question of whether counter languages are semilinear. This work makes general contributions to the theory of formal languages that are of particular interest for the interpretability of recurrent neural networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rylMgCNYvS
PDF	https://openreview.net/pdf?id=rylMgCNYvS
PWC	https://paperswithcode.com/paper/on-the-linguistic-capacity-of-real-time
Repo
Framework

Optimal Unsupervised Domain Translation


Title	Optimal Unsupervised Domain Translation
Authors	Anonymous
Abstract	Unsupervised Domain Translation~(UDT) consists in finding meaningful correspondences between two domains, without access to explicit pairings between them. Following the seminal work of \textit{CycleGAN}, many variants and extensions of this model have been applied successfully to a wide range of applications. However, these methods remain poorly understood, and lack convincing theoretical guarantees. In this work, we define UDT in a rigorous, non-ambiguous manner, explore the implicit biases present in the approach and demonstrate the limits of theses approaches. Specifically, we show that mappings produced by these methods are biased towards \textit{low energy} transformations, leading us to cast UDT into an Optimal Transport~(OT) framework by making this implicit bias explicit. This not only allows us to provide theoretical guarantees for existing methods, but also to solve UDT problems where previous methods fail. Finally, making the link between the dynamic formulation of OT and CycleGAN, we propose a simple approach to solve UDT, and illustrate its properties in two distinct settings.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1eRYxHYPB
PDF	https://openreview.net/pdf?id=H1eRYxHYPB
PWC	https://paperswithcode.com/paper/optimal-unsupervised-domain-translation-1
Repo
Framework

A Simple Recurrent Unit with Reduced Tensor Product Representations


Title	A Simple Recurrent Unit with Reduced Tensor Product Representations
Authors	Anonymous
Abstract	Widely used recurrent units, including Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU), perform well on natural language tasks, but their ability to learn structured representations is still questionable. Exploiting reduced Tensor Product Representations (TPRs) — distributed representations of symbolic structure in which vector-embedded symbols are bound to vector-embedded structural positions — we propose the TPRU, a simple recurrent unit that, at each time step, explicitly executes structural-role binding and unbinding operations to incorporate structural information into learning. The gradient analysis of our proposed TPRU is conducted to support our model design, and its performance on multiple datasets shows the effectiveness of it. Furthermore, observations on linguistically grounded study demonstrate the interpretability of our TPRU.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rkgFXR4KPr
PDF	https://openreview.net/pdf?id=rkgFXR4KPr
PWC	https://paperswithcode.com/paper/a-simple-recurrent-unit-with-reduced-tensor
Repo
Framework

Stochastic Conditional Generative Networks with Basis Decomposition


Title	Stochastic Conditional Generative Networks with Basis Decomposition
Authors	Anonymous
Abstract	While generative adversarial networks (GANs) have revolutionized machine learning, a number of open questions remain to fully understand them and exploit their power. One of these questions is how to efficiently achieve proper diversity and sampling of the multi-mode data space. To address this, we introduce BasisGAN, a stochastic conditional multi-mode image generator. By exploiting the observation that a convolutional filter can be well approximated as a linear combination of a small set of basis elements, we learn a plug-and-played basis generator to stochastically generate basis elements, with just a few hundred of parameters, to fully embed stochasticity into convolutional filters. By sampling basis elements instead of filters, we dramatically reduce the cost of modeling the parameter space with no sacrifice on either image diversity or fidelity. To illustrate this proposed plug-and-play framework, we construct variants of BasisGAN based on state-of-the-art conditional image generation networks, and train the networks by simply plugging in a basis generator, without additional auxiliary components, hyperparameters, or training objectives. The experimental success is complemented with theoretical results indicating how the perturbations introduced by the proposed sampling of basis elements can propagate to the appearance of generated images.
Tasks	Conditional Image Generation, Image Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=S1lSapVtwS
PDF	https://openreview.net/pdf?id=S1lSapVtwS
PWC	https://paperswithcode.com/paper/stochastic-conditional-generative-networks-1
Repo
Framework

InfoCNF: Efficient Conditional Continuous Normalizing Flow Using Adaptive Solvers


Title	InfoCNF: Efficient Conditional Continuous Normalizing Flow Using Adaptive Solvers
Authors	Anonymous
Abstract	Continuous Normalizing Flows (CNFs) have emerged as promising deep generative models for a wide range of tasks thanks to their invertibility and exact likelihood estimation. However, conditioning CNFs on signals of interest for conditional image generation and downstream predictive tasks is inefficient due to the high-dimensional latent code generated by the model, which needs to be of the same size as the input data. In this paper, we propose InfoCNF, an efficient conditional CNF that partitions the latent space into a class-specific supervised code and an unsupervised code that shared among all classes for efficient use of labeled information. Since the partitioning strategy (slightly) increases the number of function evaluations (NFEs), InfoCNF also employs gating networks to learn the error tolerances of its ordinary differential equation (ODE) solvers for better speed and performance. We show empirically that InfoCNF improves the test accuracy over the baseline while yielding comparable likelihood scores and reducing the NFEs on CIFAR10. Furthermore, applying the same partitioning strategy in InfoCNF on time-series data helps improve extrapolation performance.
Tasks	Conditional Image Generation, Image Generation, Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgvl6EFwH
PDF	https://openreview.net/pdf?id=SJgvl6EFwH
PWC	https://paperswithcode.com/paper/infocnf-efficient-conditional-continuous
Repo
Framework

Distributed Training Across the World


Title	Distributed Training Across the World
Authors	Anonymous
Abstract	Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e.g. 25Gb Ethernet or Infini-band). However, in many application scenarios, training data are often distributed across many geographic locations, where physical distance is long and latency is high. Traditional synchronous distributed training cannot scale well under such limited network conditions. In this work, we aim to scale distributed learning un-der high-latency network. To achieve this, we propose delayed and temporally sparse (DTS) update that enables synchronous training to tolerate extreme network conditions without compromising accuracy. We benchmark our algorithms on servers deployed across three continents in the world: London (Europe), Tokyo(Asia), Oregon (North America) and Ohio (North America). Under such challenging settings, DTS achieves90×speedup over traditional methods without loss of accuracy on ImageNet.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeuueSYDH
PDF	https://openreview.net/pdf?id=SJeuueSYDH
PWC	https://paperswithcode.com/paper/distributed-training-across-the-world
Repo
Framework

RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling


Title	RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling
Authors	Anonymous
Abstract	Understanding black-box machine learning models is important towards their widespread adoption. However, developing globally interpretable models that explain the behavior of the entire model is challenging. An alternative approach is to explain black-box models through explaining individual prediction using a locally interpretable model. In this paper, we propose a novel method for locally interpretable modeling – Reinforcement Learning-based Locally Interpretable Modeling (RL-LIM). RL-LIM employs reinforcement learning to select a small number of samples and distill the black-box model prediction into a low-capacity locally interpretable model. Training is guided with a reward that is obtained directly by measuring agreement of the predictions from the locally interpretable model with the black-box model. RL-LIM near-matches the overall prediction performance of black-box models while yielding human-like interpretability, and significantly outperforms state of the art locally interpretable models in terms of overall prediction performance and fidelity.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJx8Fh4KPB
PDF	https://openreview.net/pdf?id=BJx8Fh4KPB
PWC	https://paperswithcode.com/paper/rl-lim-reinforcement-learning-based-locally-1
Repo
Framework

Surrogate-Based Constrained Langevin Sampling With Applications to Optimal Material Configuration Design


Title	Surrogate-Based Constrained Langevin Sampling With Applications to Optimal Material Configuration Design
Authors	Anonymous
Abstract	We consider the problem of generating configurations that satisfy physical constraints for optimal material nano-pattern design, where multiple (and often conflicting) properties need to be simultaneously satisfied. Consider, for example, the trade-off between thermal resistance, electrical conductivity, and mechanical stability needed to design a nano-porous template with optimal thermoelectric efficiency. To that end, we leverage the posterior regularization framework andshow that this constraint satisfaction problem can be formulated as sampling froma Gibbs distribution. The main challenges come from the black-box nature ofthose physical constraints, since they are obtained via solving highly non-linearPDEs. To overcome those difficulties, we introduce Surrogate-based Constrained Langevin dynamics for black-box sampling. We explore two surrogate approaches. The first approach exploits zero-order approximation of gradients in the Langevin Sampling and we refer to it as Zero-Order Langevin. In practice, this approach can be prohibitive since we still need to often query the expensive PDE solvers. The second approach approximates the gradients in the Langevin dynamics with deep neural networks, allowing us an efficient sampling strategy using the surrogate model. We prove the convergence of those two approaches when the target distribution is log-concave and smooth. We show the effectiveness of both approaches in designing optimal nano-porous material configurations, where the goal is to produce nano-pattern templates with low thermal conductivity and reasonable mechanical stability.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1l_gA4KvH
PDF	https://openreview.net/pdf?id=H1l_gA4KvH
PWC	https://paperswithcode.com/paper/surrogate-based-constrained-langevin-sampling
Repo
Framework

Reconstructing continuous distributions of 3D protein structure from cryo-EM images


Title	Reconstructing continuous distributions of 3D protein structure from cryo-EM images
Authors	Anonymous
Abstract	Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structure of proteins and other macromolecular complexes at near-atomic resolution. In single particle cryo-EM, the central problem is to reconstruct the 3D structure of a macromolecule from $10^{4-7}$ noisy and randomly oriented 2D projection images. However, the imaged protein complexes may exhibit structural variability, which complicates reconstruction and is typically addressed using discrete clustering approaches that fail to capture the full range of protein dynamics. Here, we introduce a novel method for cryo-EM reconstruction that extends naturally to modeling continuous generative factors of structural heterogeneity. This method encodes structures in Fourier space using coordinate-based deep neural networks, and trains these networks from unlabeled 2D cryo-EM images by combining exact inference over image orientation with variational inference for structural heterogeneity. We demonstrate that the proposed method, termed cryoDRGN, can perform ab-initio reconstruction of 3D protein complexes from simulated and real 2D cryo-EM image data. To our knowledge, cryoDRGN is the first neural network-based approach for cryo-EM reconstruction and the first end-to-end method for directly reconstructing continuous ensembles of protein structures from cryo-EM images.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxUjlBtwB
PDF	https://openreview.net/pdf?id=SJxUjlBtwB
PWC	https://paperswithcode.com/paper/reconstructing-continuous-distributions-of-3d
Repo
Framework

MODELLING BIOLOGICAL ASSAYS WITH ADAPTIVE DEEP KERNEL LEARNING


Title	MODELLING BIOLOGICAL ASSAYS WITH ADAPTIVE DEEP KERNEL LEARNING
Authors	Anonymous
Abstract	Due to the significant costs of data generation, many prediction tasks within drug discovery are by nature few-shot regression (FSR) problems, including accurate modelling of biological assays. Although a number of few-shot classification and reinforcement learning methods exist for similar applications, we find relatively few FSR methods meeting the performance standards required for such tasks under real-world constraints. Inspired by deep kernel learning, we develop a novel FSR algorithm that is better suited to these settings. Our algorithm consists of learning a deep network in combination with a kernel function and a differentiable kernel algorithm. As the choice of the kernel is critical, our algorithm learns to find the appropriate one for each task during inference. It thus performs more effectively with complex task distributions, outperforming current state-of-the-art algorithms on both toy and novel, real-world benchmarks that we introduce herein. By introducing novel benchmarks derived from biological assays, we hope that the community will progress towards the development of FSR algorithms suitable for use in noisy and uncertain environments such as drug discovery.
Tasks	Drug Discovery, few-shot regression
Published	2020-01-01
URL	https://openreview.net/forum?id=Syeu8CNYvS
PDF	https://openreview.net/pdf?id=Syeu8CNYvS
PWC	https://paperswithcode.com/paper/modelling-biological-assays-with-adaptive
Repo
Framework