April 1, 2020

2945 words 14 mins read

Paper Group NANR 18

Paper Group NANR 18

Learning to Guide Random Search. Bio-Inspired Hashing for Unsupervised Similarity Search. Detecting malicious PDF using CNN. DeepV2D: Video to Depth with Differentiable Structure from Motion. Unsupervised Data Augmentation for Consistency Training. A Learning-based Iterative Method for Solving Vehicle Routing Problems. Improved Training Techniques …

Title Learning to Guide Random Search
Authors Anonymous
Abstract We are interested in the optimization of a high-dimensional function when only function evaluations are possible. Although this derivative-free setting arises in many applications, existing methods suffer from high sample complexity since their sample complexity depend on problem dimensionality, in contrast to the dimensionality-independent rates of first-order methods. The recent success of deep learning methods suggests that many data modalities lie on low-dimensional manifolds that can be represented by deep nonlinear models. Based on this observation, we consider derivative-free optimization of functions defined on low-dimensional manifolds. We develop an online learning approach that learns this manifold while performing the optimization. In other words, we jointly learn the manifold and optimize the function. Our analysis suggests that the proposed method significantly reduces sample complexity. We empirically evaluate the presented method on continuous optimization benchmarks and high-dimensional continuous control problems. Our method achieves significantly lower sample complexity than Augmented Random Search and other derivative-free optimization algorithms.
Tasks Continuous Control
Published 2020-01-01
URL https://openreview.net/forum?id=B1gHokBKwS
PDF https://openreview.net/pdf?id=B1gHokBKwS
PWC https://paperswithcode.com/paper/learning-to-guide-random-search
Repo
Framework
Title Bio-Inspired Hashing for Unsupervised Similarity Search
Authors Anonymous
Abstract The fruit fly Drosophila’s olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Bylkd0EFwr
PDF https://openreview.net/pdf?id=Bylkd0EFwr
PWC https://paperswithcode.com/paper/bio-inspired-hashing-for-unsupervised
Repo
Framework

Detecting malicious PDF using CNN

Title Detecting malicious PDF using CNN
Authors Raphael Fettaya, Yishay Mansour
Abstract Malicious PDF files represent one of the biggest threats to computer security. To detect them, significant research has been done using handwritten signatures or machine learning based on manual feature extraction. Those approaches are both time-consuming, requires significant prior knowledge and the list of features has to be updated with each newly discovered vulnerability. In this work, we propose a novel algorithm that uses a Convolutional Neural Network (CNN) on the byte level of the file, without any handcrafted features. We show, using a data set of 130000 files, that our approach maintains a high detection rate (96%) of PDF malware and even detects new malicious files, still undetected by most antiviruses. Using automatically generated features from our CNN network, and applying a clustering algorithm, we also obtain high similarity between the antiviruses’ labels and the resulting clusters.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJeW-A4tDS
PDF https://openreview.net/pdf?id=SJeW-A4tDS
PWC https://paperswithcode.com/paper/detecting-malicious-pdf-using-cnn
Repo
Framework

DeepV2D: Video to Depth with Differentiable Structure from Motion

Title DeepV2D: Video to Depth with Differentiable Structure from Motion
Authors Anonymous
Abstract We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth.
Tasks Depth Estimation, Motion Estimation
Published 2020-01-01
URL https://openreview.net/forum?id=HJeO7RNKPr
PDF https://openreview.net/pdf?id=HJeO7RNKPr
PWC https://paperswithcode.com/paper/deepv2d-video-to-depth-with-differentiable-1
Repo
Framework

Unsupervised Data Augmentation for Consistency Training

Title Unsupervised Data Augmentation for Consistency Training
Authors Anonymous
Abstract Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 2.7% with only 4,000 examples, nearly matching the performance of models trained on 50,000 labeled examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.
Tasks Data Augmentation, Text Classification, Transfer Learning
Published 2020-01-01
URL https://openreview.net/forum?id=ByeL1R4FvS
PDF https://openreview.net/pdf?id=ByeL1R4FvS
PWC https://paperswithcode.com/paper/unsupervised-data-augmentation-for
Repo
Framework

A Learning-based Iterative Method for Solving Vehicle Routing Problems

Title A Learning-based Iterative Method for Solving Vehicle Routing Problems
Authors Anonymous
Abstract This paper is concerned with solving combinatorial optimization problems, in particular, the capacitated vehicle routing problems (CVRP). Classical Operations Research (OR) algorithms such as LKH3 (Helsgaun, 2017) are extremely inefficient (e.g., 13 hours on CVRP of only size 100) and difficult to scale to larger-size problems. Machine learning based approaches have recently shown to be promising, partly because of their efficiency (once trained, they can perform solving within minutes or even seconds). However, there is still a considerable gap between the quality of a machine learned solution and what OR methods can offer (e.g., on CVRP-100, the best result of learned solutions is between 16.10-16.80, significantly worse than LKH3’s 15.65). In this paper, we present the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods. Starting with a random initial solution, our algorithm learns to iteratively refines the solution with an improvement operator, selected by a reinforcement learning based controller. The improvement operator is selected from a pool of powerful operators that are customized for routing problems. By combining the strengths of the two worlds, our approach achieves the new state-of-the-art results on CVRP, e.g., an average cost of 15.57 on CVRP-100.
Tasks Combinatorial Optimization
Published 2020-01-01
URL https://openreview.net/forum?id=BJe1334YDH
PDF https://openreview.net/pdf?id=BJe1334YDH
PWC https://paperswithcode.com/paper/a-learning-based-iterative-method-for-solving
Repo
Framework

Improved Training Techniques for Online Neural Machine Translation

Title Improved Training Techniques for Online Neural Machine Translation
Authors Anonymous
Abstract Neural sequence-to-sequence models are at the basis of state-of-the-art solutions for sequential prediction problems such as machine translation and speech recognition. The models typically assume that the entire input is available when starting target generation. In some applications, however, it is desirable to start the decoding process before the entire input is available, e.g. to reduce the latency in automatic speech recognition. We consider state-of-the-art wait-k decoders, that first read k tokens from the source and then alternate between reading tokens from the input and writing to the output. We investigate the sensitivity of such models to the value of k that is used during training and when deploying the model, and the effect of updating the hidden states in transformer models as new source tokens are read. We experiment with German-English translation on the IWSLT14 dataset and the larger WMT15 dataset. Our results significantly improve over earlier state-of-the-art results for German-English translation on the WMT15 dataset across different latency levels.
Tasks Machine Translation, Speech Recognition
Published 2020-01-01
URL https://openreview.net/forum?id=rke3OxSKwr
PDF https://openreview.net/pdf?id=rke3OxSKwr
PWC https://paperswithcode.com/paper/improved-training-techniques-for-online
Repo
Framework

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Title Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Authors Anonymous
Abstract Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5%. Learnable K-matrices can also simplify hand-engineered pipelines—we replace filter bank feature computation in speech data preprocessing with a kaleidoscope layer, resulting in only 0.4% loss in accuracy on the TIMIT speech recognition task. K-matrices can also capture latent structure in models: for a challenging permuted image classification task, adding a K-matrix to a standard convolutional architecture can enable learning the latent permutation and improve accuracy by over 8 points. We provide a practically efficient implementation of our approach, and use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
Tasks Image Classification, Speech Recognition
Published 2020-01-01
URL https://openreview.net/forum?id=BkgrBgSYDS
PDF https://openreview.net/pdf?id=BkgrBgSYDS
PWC https://paperswithcode.com/paper/kaleidoscope-an-efficient-learnable
Repo
Framework

Top-down training for neural networks

Title Top-down training for neural networks
Authors Anonymous
Abstract Vanishing gradients pose a challenge when training deep neural networks, resulting in the top layers (closer to the output) in the network learning faster when compared with lower layers closer to the input. Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor. This can lead to the feature extractor being under-trained, possibly failing to learn much about the patterns in the input data. To address this we propose a good classifier hypothesis: given a fixed classifier that partitions the space well, the feature extractor can be further trained to fit that classifier and learn the data patterns well. This alleviates the problem of under-training the feature extractor and enables the network to learn patterns in the data with small partial derivatives. We verify this hypothesis empirically and propose a novel top-down training method. We train all layers jointly, obtaining a good classifier from the top layers, which are then frozen. Following re-initialization, we retrain the bottom layers with respect to the frozen classifier. Applying this approach to a set of speech recognition experiments using the Wall Street Journal and noisy CHiME-4 datasets we observe substantial accuracy gains. When combined with dropout, our method enables connectionist temporal classification (CTC) models to outperform joint CTC-attention models, which have more capacity and flexibility.
Tasks Speech Recognition
Published 2020-01-01
URL https://openreview.net/forum?id=rJg8NertPr
PDF https://openreview.net/pdf?id=rJg8NertPr
PWC https://paperswithcode.com/paper/top-down-training-for-neural-networks
Repo
Framework

Unsupervised Learning of Efficient and Robust Speech Representations

Title Unsupervised Learning of Efficient and Robust Speech Representations
Authors Anonymous
Abstract We present an unsupervised method for learning speech representations based on a bidirectional contrastive predictive coding that implicitly discovers phonetic structure from large-scale corpora of unlabelled raw audio signals. The representations, which we learn from up to 8000 hours of publicly accessible speech data, are evaluated by looking at their impact on the behaviour of supervised speech recognition systems. First, across a variety of datasets, we find that the features learned from the largest and most diverse pretraining dataset result in significant improvements over standard audio features as well as over features learned from smaller amounts of pretraining data. Second, they significantly improve sample efficiency in low-data scenarios. Finally, the features confer significant robustness advantages to the resulting recognition systems: we see significant improvements in out-of-domain transfer relative to baseline feature sets, and the features likewise provide improvements in four different low-resource African language datasets.
Tasks Speech Recognition
Published 2020-01-01
URL https://openreview.net/forum?id=HJe-blSYvH
PDF https://openreview.net/pdf?id=HJe-blSYvH
PWC https://paperswithcode.com/paper/unsupervised-learning-of-efficient-and-robust
Repo
Framework

Quantum Algorithms for Deep Convolutional Neural Networks

Title Quantum Algorithms for Deep Convolutional Neural Networks
Authors Iordanis Kerenidis, Jonas Landman, Anupam Prakash
Abstract Quantum computing is a powerful computational paradigm with applications in several fields, including machine learning. In the last decade, deep learning, and in particular Convolutional Neural Networks (CNN), have become essential for applications in signal processing and image recognition. Quantum deep learning, however, remains a challenging problem, as it is difficult to implement non linearities with quantum unitaries. In this paper we propose a quantum algorithm for evaluating and training deep convolutional neural networks with potential speedups over classical CNNs for both the forward and backward passes. The quantum CNN (QCNN) reproduces completely the outputs of the classical CNN and allows for non linearities and pooling operations. The QCNN is in particular interesting for deep networks and could allow new frontiers in the image recognition domain, by allowing for many more convolution kernels, larger kernels, high dimensional inputs and high depth input channels. We also present numerical simulations for the classification of the MNIST dataset to provide practical evidence for the efficiency of the QCNN.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Hygab1rKDS
PDF https://openreview.net/pdf?id=Hygab1rKDS
PWC https://paperswithcode.com/paper/quantum-algorithms-for-deep-convolutional
Repo
Framework

DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS

Title DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS
Authors Anonymous
Abstract Graph neural networks (GNN) such as GCN, GAT, MoNet have achieved state-of-the-art results on semi-supervised learning on graphs. However, when the number of labeled nodes is very small, the performances of GNNs downgrade dramatically. Self-training has proved to be effective for resolving this issue, however, the performance of self-trained GCN is still inferior to that of G2G and DGI for many settings. Moreover, additional model complexity make it more difficult to tune the hyper-parameters and do model selection. We argue that the power of self-training is still not fully explored for the node classification task. In this paper, we propose a unified end-to-end self-training framework called \emph{Dynamic Self-traning}, which generalizes and simplifies prior work. A simple instantiation of the framework based on GCN is provided and empirical results show that our framework outperforms all previous methods including GNNs, embedding based method and self-trained GCNs by a noticeable margin. Moreover, compared with standard self-training, hyper-parameter tuning for our framework is easier.
Tasks Model Selection, Node Classification
Published 2020-01-01
URL https://openreview.net/forum?id=SJgCEpVtvr
PDF https://openreview.net/pdf?id=SJgCEpVtvr
PWC https://paperswithcode.com/paper/dynamic-self-training-framework-for-graph-1
Repo
Framework

Improved Training of Certifiably Robust Models

Title Improved Training of Certifiably Robust Models
Authors Anonymous
Abstract Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical (PGD) robustness. In principle, relaxation can provide tight bounds if the convex relaxation solution is feasible for the original non-relaxed problem. Therefore, we propose two regularizers that can be used to train neural networks that yield convex relaxations with tighter bounds. In all of our experiments, the proposed regularizations result in tighter certification bounds than non-regularized baselines.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygqFlBtPS
PDF https://openreview.net/pdf?id=HygqFlBtPS
PWC https://paperswithcode.com/paper/improved-training-of-certifiably-robust
Repo
Framework

Uncertainty-Aware Prediction for Graph Neural Networks

Title Uncertainty-Aware Prediction for Graph Neural Networks
Authors Anonymous
Abstract Thanks to graph neural networks (GNNs), semi-supervised node classification has shown the state-of-the-art performance in graph data. However, GNNs do not consider any types of uncertainties associated with the class probabilities to minimize risk due to misclassification under uncertainty in real life. In this work, we propose a Bayesian deep learning framework reflecting various types of uncertainties for classification predictions by leveraging the powerful modeling and learning capabilities of GNNs. We considered multiple uncertainty types in both deep learning (DL) and belief/evidence theory domains. We treat the predictions of a Bayesian GNN (BGNN) as nodes’ multinomial subjective opinions in a graph based on Dirichlet distributions where each belief mass is a belief probability of each class. By collecting evidence from the given labels of training nodes, the BGNN model is designed for accurately predicting probabilities of each class and detecting out-of-distribution. We validated the outperformance of the proposed BGNN, compared to the state-of-the-art counterparts in terms of the accuracy of node classification prediction and out-of-distribution detection based on six real network datasets.
Tasks Node Classification, Out-of-Distribution Detection
Published 2020-01-01
URL https://openreview.net/forum?id=SyxdC6NKwH
PDF https://openreview.net/pdf?id=SyxdC6NKwH
PWC https://paperswithcode.com/paper/uncertainty-aware-prediction-for-graph-neural
Repo
Framework

Deep Lifetime Clustering

Title Deep Lifetime Clustering
Authors Anonymous
Abstract The goal of lifetime clustering is to develop an inductive model that maps subjects into $K$ clusters according to their underlying (unobserved) lifetime distribution. We introduce a neural-network based lifetime clustering model that can find cluster assignments by directly maximizing the divergence between the empirical lifetime distributions of the clusters. Accordingly, we define a novel clustering loss function over the lifetime distributions (of entire clusters) based on a tight upper bound of the two-sample Kuiper test p-value. The resultant model is robust to the modeling issues associated with the unobservability of termination signals, and does not assume proportional hazards. Our results in real and synthetic datasets show significantly better lifetime clusters (as evaluated by C-index, Brier Score, Logrank score and adjusted Rand index) as compared to competing approaches.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SkeYUkStPr
PDF https://openreview.net/pdf?id=SkeYUkStPr
PWC https://paperswithcode.com/paper/deep-lifetime-clustering
Repo
Framework
comments powered by Disqus