April 1, 2020

2945 words 14 mins read

Paper Group NANR 18

Learning to Guide Random Search. Bio-Inspired Hashing for Unsupervised Similarity Search. Detecting malicious PDF using CNN. DeepV2D: Video to Depth with Differentiable Structure from Motion. Unsupervised Data Augmentation for Consistency Training. A Learning-based Iterative Method for Solving Vehicle Routing Problems. Improved Training Techniques …

Learning to Guide Random Search


Title	Learning to Guide Random Search
Authors	Anonymous
Abstract	We are interested in the optimization of a high-dimensional function when only function evaluations are possible. Although this derivative-free setting arises in many applications, existing methods suffer from high sample complexity since their sample complexity depend on problem dimensionality, in contrast to the dimensionality-independent rates of first-order methods. The recent success of deep learning methods suggests that many data modalities lie on low-dimensional manifolds that can be represented by deep nonlinear models. Based on this observation, we consider derivative-free optimization of functions defined on low-dimensional manifolds. We develop an online learning approach that learns this manifold while performing the optimization. In other words, we jointly learn the manifold and optimize the function. Our analysis suggests that the proposed method significantly reduces sample complexity. We empirically evaluate the presented method on continuous optimization benchmarks and high-dimensional continuous control problems. Our method achieves significantly lower sample complexity than Augmented Random Search and other derivative-free optimization algorithms.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=B1gHokBKwS
PDF	https://openreview.net/pdf?id=B1gHokBKwS
PWC	https://paperswithcode.com/paper/learning-to-guide-random-search
Repo
Framework

Bio-Inspired Hashing for Unsupervised Similarity Search


Title	Bio-Inspired Hashing for Unsupervised Similarity Search
Authors	Anonymous
Abstract	The fruit fly Drosophila’s olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bylkd0EFwr
PDF	https://openreview.net/pdf?id=Bylkd0EFwr
PWC	https://paperswithcode.com/paper/bio-inspired-hashing-for-unsupervised
Repo
Framework

Detecting malicious PDF using CNN


Title	Detecting malicious PDF using CNN
Authors	Raphael Fettaya, Yishay Mansour
Abstract	Malicious PDF files represent one of the biggest threats to computer security. To detect them, significant research has been done using handwritten signatures or machine learning based on manual feature extraction. Those approaches are both time-consuming, requires significant prior knowledge and the list of features has to be updated with each newly discovered vulnerability. In this work, we propose a novel algorithm that uses a Convolutional Neural Network (CNN) on the byte level of the file, without any handcrafted features. We show, using a data set of 130000 files, that our approach maintains a high detection rate (96%) of PDF malware and even detects new malicious files, still undetected by most antiviruses. Using automatically generated features from our CNN network, and applying a clustering algorithm, we also obtain high similarity between the antiviruses’ labels and the resulting clusters.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeW-A4tDS
PDF	https://openreview.net/pdf?id=SJeW-A4tDS
PWC	https://paperswithcode.com/paper/detecting-malicious-pdf-using-cnn
Repo
Framework

DeepV2D: Video to Depth with Differentiable Structure from Motion


Title	DeepV2D: Video to Depth with Differentiable Structure from Motion
Authors	Anonymous
Abstract	We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth.
Tasks	Depth Estimation, Motion Estimation
Published	2020-01-01
URL	https://openreview.net/forum?id=HJeO7RNKPr
PDF	https://openreview.net/pdf?id=HJeO7RNKPr
PWC	https://paperswithcode.com/paper/deepv2d-video-to-depth-with-differentiable-1
Repo
Framework

Unsupervised Data Augmentation for Consistency Training


Title	Unsupervised Data Augmentation for Consistency Training
Authors	Anonymous
Abstract	Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 2.7% with only 4,000 examples, nearly matching the performance of models trained on 50,000 labeled examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.
Tasks	Data Augmentation, Text Classification, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=ByeL1R4FvS
PDF	https://openreview.net/pdf?id=ByeL1R4FvS
PWC	https://paperswithcode.com/paper/unsupervised-data-augmentation-for
Repo
Framework

A Learning-based Iterative Method for Solving Vehicle Routing Problems


Title	A Learning-based Iterative Method for Solving Vehicle Routing Problems
Authors	Anonymous
Abstract	This paper is concerned with solving combinatorial optimization problems, in particular, the capacitated vehicle routing problems (CVRP). Classical Operations Research (OR) algorithms such as LKH3 (Helsgaun, 2017) are extremely inefficient (e.g., 13 hours on CVRP of only size 100) and difficult to scale to larger-size problems. Machine learning based approaches have recently shown to be promising, partly because of their efficiency (once trained, they can perform solving within minutes or even seconds). However, there is still a considerable gap between the quality of a machine learned solution and what OR methods can offer (e.g., on CVRP-100, the best result of learned solutions is between 16.10-16.80, significantly worse than LKH3’s 15.65). In this paper, we present the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods. Starting with a random initial solution, our algorithm learns to iteratively refines the solution with an improvement operator, selected by a reinforcement learning based controller. The improvement operator is selected from a pool of powerful operators that are customized for routing problems. By combining the strengths of the two worlds, our approach achieves the new state-of-the-art results on CVRP, e.g., an average cost of 15.57 on CVRP-100.
Tasks	Combinatorial Optimization
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe1334YDH
PDF	https://openreview.net/pdf?id=BJe1334YDH
PWC	https://paperswithcode.com/paper/a-learning-based-iterative-method-for-solving
Repo
Framework

Improved Training Techniques for Online Neural Machine Translation


Title	Improved Training Techniques for Online Neural Machine Translation
Authors	Anonymous
Abstract	Neural sequence-to-sequence models are at the basis of state-of-the-art solutions for sequential prediction problems such as machine translation and speech recognition. The models typically assume that the entire input is available when starting target generation. In some applications, however, it is desirable to start the decoding process before the entire input is available, e.g. to reduce the latency in automatic speech recognition. We consider state-of-the-art wait-k decoders, that first read k tokens from the source and then alternate between reading tokens from the input and writing to the output. We investigate the sensitivity of such models to the value of k that is used during training and when deploying the model, and the effect of updating the hidden states in transformer models as new source tokens are read. We experiment with German-English translation on the IWSLT14 dataset and the larger WMT15 dataset. Our results significantly improve over earlier state-of-the-art results for German-English translation on the WMT15 dataset across different latency levels.
Tasks	Machine Translation, Speech Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=rke3OxSKwr
PDF	https://openreview.net/pdf?id=rke3OxSKwr
PWC	https://paperswithcode.com/paper/improved-training-techniques-for-online
Repo
Framework

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps


Title	Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Authors	Anonymous
Abstract	Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5%. Learnable K-matrices can also simplify hand-engineered pipelines—we replace filter bank feature computation in speech data preprocessing with a kaleidoscope layer, resulting in only 0.4% loss in accuracy on the TIMIT speech recognition task. K-matrices can also capture latent structure in models: for a challenging permuted image classification task, adding a K-matrix to a standard convolutional architecture can enable learning the latent permutation and improve accuracy by over 8 points. We provide a practically efficient implementation of our approach, and use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
Tasks	Image Classification, Speech Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=BkgrBgSYDS
PDF	https://openreview.net/pdf?id=BkgrBgSYDS
PWC	https://paperswithcode.com/paper/kaleidoscope-an-efficient-learnable
Repo
Framework

Top-down training for neural networks


Title	Top-down training for neural networks
Authors	Anonymous
Abstract	Vanishing gradients pose a challenge when training deep neural networks, resulting in the top layers (closer to the output) in the network learning faster when compared with lower layers closer to the input. Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor. This can lead to the feature extractor being under-trained, possibly failing to learn much about the patterns in the input data. To address this we propose a good classifier hypothesis: given a fixed classifier that partitions the space well, the feature extractor can be further trained to fit that classifier and learn the data patterns well. This alleviates the problem of under-training the feature extractor and enables the network to learn patterns in the data with small partial derivatives. We verify this hypothesis empirically and propose a novel top-down training method. We train all layers jointly, obtaining a good classifier from the top layers, which are then frozen. Following re-initialization, we retrain the bottom layers with respect to the frozen classifier. Applying this approach to a set of speech recognition experiments using the Wall Street Journal and noisy CHiME-4 datasets we observe substantial accuracy gains. When combined with dropout, our method enables connectionist temporal classification (CTC) models to outperform joint CTC-attention models, which have more capacity and flexibility.
Tasks	Speech Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=rJg8NertPr
PDF	https://openreview.net/pdf?id=rJg8NertPr
PWC	https://paperswithcode.com/paper/top-down-training-for-neural-networks
Repo
Framework

Unsupervised Learning of Efficient and Robust Speech Representations


Title	Unsupervised Learning of Efficient and Robust Speech Representations
Authors	Anonymous
Abstract	We present an unsupervised method for learning speech representations based on a bidirectional contrastive predictive coding that implicitly discovers phonetic structure from large-scale corpora of unlabelled raw audio signals. The representations, which we learn from up to 8000 hours of publicly accessible speech data, are evaluated by looking at their impact on the behaviour of supervised speech recognition systems. First, across a variety of datasets, we find that the features learned from the largest and most diverse pretraining dataset result in significant improvements over standard audio features as well as over features learned from smaller amounts of pretraining data. Second, they significantly improve sample efficiency in low-data scenarios. Finally, the features confer significant robustness advantages to the resulting recognition systems: we see significant improvements in out-of-domain transfer relative to baseline feature sets, and the features likewise provide improvements in four different low-resource African language datasets.
Tasks	Speech Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=HJe-blSYvH
PDF	https://openreview.net/pdf?id=HJe-blSYvH
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-efficient-and-robust
Repo
Framework

Quantum Algorithms for Deep Convolutional Neural Networks


Title	Quantum Algorithms for Deep Convolutional Neural Networks
Authors	Iordanis Kerenidis, Jonas Landman, Anupam Prakash
Abstract	Quantum computing is a powerful computational paradigm with applications in several fields, including machine learning. In the last decade, deep learning, and in particular Convolutional Neural Networks (CNN), have become essential for applications in signal processing and image recognition. Quantum deep learning, however, remains a challenging problem, as it is difficult to implement non linearities with quantum unitaries. In this paper we propose a quantum algorithm for evaluating and training deep convolutional neural networks with potential speedups over classical CNNs for both the forward and backward passes. The quantum CNN (QCNN) reproduces completely the outputs of the classical CNN and allows for non linearities and pooling operations. The QCNN is in particular interesting for deep networks and could allow new frontiers in the image recognition domain, by allowing for many more convolution kernels, larger kernels, high dimensional inputs and high depth input channels. We also present numerical simulations for the classification of the MNIST dataset to provide practical evidence for the efficiency of the QCNN.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hygab1rKDS
PDF	https://openreview.net/pdf?id=Hygab1rKDS
PWC	https://paperswithcode.com/paper/quantum-algorithms-for-deep-convolutional
Repo
Framework

DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS


Title	DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS
Authors	Anonymous
Abstract	Graph neural networks (GNN) such as GCN, GAT, MoNet have achieved state-of-the-art results on semi-supervised learning on graphs. However, when the number of labeled nodes is very small, the performances of GNNs downgrade dramatically. Self-training has proved to be effective for resolving this issue, however, the performance of self-trained GCN is still inferior to that of G2G and DGI for many settings. Moreover, additional model complexity make it more difficult to tune the hyper-parameters and do model selection. We argue that the power of self-training is still not fully explored for the node classification task. In this paper, we propose a unified end-to-end self-training framework called \emph{Dynamic Self-traning}, which generalizes and simplifies prior work. A simple instantiation of the framework based on GCN is provided and empirical results show that our framework outperforms all previous methods including GNNs, embedding based method and self-trained GCNs by a noticeable margin. Moreover, compared with standard self-training, hyper-parameter tuning for our framework is easier.
Tasks	Model Selection, Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgCEpVtvr
PDF	https://openreview.net/pdf?id=SJgCEpVtvr
PWC	https://paperswithcode.com/paper/dynamic-self-training-framework-for-graph-1
Repo
Framework

Improved Training of Certifiably Robust Models


Title	Improved Training of Certifiably Robust Models
Authors	Anonymous
Abstract	Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical (PGD) robustness. In principle, relaxation can provide tight bounds if the convex relaxation solution is feasible for the original non-relaxed problem. Therefore, we propose two regularizers that can be used to train neural networks that yield convex relaxations with tighter bounds. In all of our experiments, the proposed regularizations result in tighter certification bounds than non-regularized baselines.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HygqFlBtPS
PDF	https://openreview.net/pdf?id=HygqFlBtPS
PWC	https://paperswithcode.com/paper/improved-training-of-certifiably-robust
Repo
Framework

Uncertainty-Aware Prediction for Graph Neural Networks


Title	Uncertainty-Aware Prediction for Graph Neural Networks
Authors	Anonymous
Abstract	Thanks to graph neural networks (GNNs), semi-supervised node classification has shown the state-of-the-art performance in graph data. However, GNNs do not consider any types of uncertainties associated with the class probabilities to minimize risk due to misclassification under uncertainty in real life. In this work, we propose a Bayesian deep learning framework reflecting various types of uncertainties for classification predictions by leveraging the powerful modeling and learning capabilities of GNNs. We considered multiple uncertainty types in both deep learning (DL) and belief/evidence theory domains. We treat the predictions of a Bayesian GNN (BGNN) as nodes’ multinomial subjective opinions in a graph based on Dirichlet distributions where each belief mass is a belief probability of each class. By collecting evidence from the given labels of training nodes, the BGNN model is designed for accurately predicting probabilities of each class and detecting out-of-distribution. We validated the outperformance of the proposed BGNN, compared to the state-of-the-art counterparts in terms of the accuracy of node classification prediction and out-of-distribution detection based on six real network datasets.
Tasks	Node Classification, Out-of-Distribution Detection
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxdC6NKwH
PDF	https://openreview.net/pdf?id=SyxdC6NKwH
PWC	https://paperswithcode.com/paper/uncertainty-aware-prediction-for-graph-neural
Repo
Framework

Deep Lifetime Clustering


Title	Deep Lifetime Clustering
Authors	Anonymous
Abstract	The goal of lifetime clustering is to develop an inductive model that maps subjects into $K$ clusters according to their underlying (unobserved) lifetime distribution. We introduce a neural-network based lifetime clustering model that can find cluster assignments by directly maximizing the divergence between the empirical lifetime distributions of the clusters. Accordingly, we define a novel clustering loss function over the lifetime distributions (of entire clusters) based on a tight upper bound of the two-sample Kuiper test p-value. The resultant model is robust to the modeling issues associated with the unobservability of termination signals, and does not assume proportional hazards. Our results in real and synthetic datasets show significantly better lifetime clusters (as evaluated by C-index, Brier Score, Logrank score and adjusted Rand index) as compared to competing approaches.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkeYUkStPr
PDF	https://openreview.net/pdf?id=SkeYUkStPr
PWC	https://paperswithcode.com/paper/deep-lifetime-clustering
Repo
Framework