Paper Group NANR 18
Learning to Guide Random Search. Bio-Inspired Hashing for Unsupervised Similarity Search. Detecting malicious PDF using CNN. DeepV2D: Video to Depth with Differentiable Structure from Motion. Unsupervised Data Augmentation for Consistency Training. A Learning-based Iterative Method for Solving Vehicle Routing Problems. Improved Training Techniques …
Learning to Guide Random Search
Title | Learning to Guide Random Search |
Authors | Anonymous |
Abstract | We are interested in the optimization of a high-dimensional function when only function evaluations are possible. Although this derivative-free setting arises in many applications, existing methods suffer from high sample complexity since their sample complexity depend on problem dimensionality, in contrast to the dimensionality-independent rates of first-order methods. The recent success of deep learning methods suggests that many data modalities lie on low-dimensional manifolds that can be represented by deep nonlinear models. Based on this observation, we consider derivative-free optimization of functions defined on low-dimensional manifolds. We develop an online learning approach that learns this manifold while performing the optimization. In other words, we jointly learn the manifold and optimize the function. Our analysis suggests that the proposed method significantly reduces sample complexity. We empirically evaluate the presented method on continuous optimization benchmarks and high-dimensional continuous control problems. Our method achieves significantly lower sample complexity than Augmented Random Search and other derivative-free optimization algorithms. |
Tasks | Continuous Control |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1gHokBKwS |
https://openreview.net/pdf?id=B1gHokBKwS | |
PWC | https://paperswithcode.com/paper/learning-to-guide-random-search |
Repo | |
Framework | |
Bio-Inspired Hashing for Unsupervised Similarity Search
Title | Bio-Inspired Hashing for Unsupervised Similarity Search |
Authors | Anonymous |
Abstract | The fruit fly Drosophila’s olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, FlyHash. In contrast with classical LSH algorithms that produce low dimensional hash codes, FlyHash produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, FlyHash uses random projections and cannot learn from data. Building on inspiration from FlyHash and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm BioHash that produces sparse high dimensional hash codes in a data-driven manner. We show that BioHash outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a local and biologically plausible synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant BioConvHash that further improves performance. From the perspective of computer science, BioHash and BioConvHash are fast, scalable and yield compressed binary representations that are useful for similarity search. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bylkd0EFwr |
https://openreview.net/pdf?id=Bylkd0EFwr | |
PWC | https://paperswithcode.com/paper/bio-inspired-hashing-for-unsupervised |
Repo | |
Framework | |
Detecting malicious PDF using CNN
Title | Detecting malicious PDF using CNN |
Authors | Raphael Fettaya, Yishay Mansour |
Abstract | Malicious PDF files represent one of the biggest threats to computer security. To detect them, significant research has been done using handwritten signatures or machine learning based on manual feature extraction. Those approaches are both time-consuming, requires significant prior knowledge and the list of features has to be updated with each newly discovered vulnerability. In this work, we propose a novel algorithm that uses a Convolutional Neural Network (CNN) on the byte level of the file, without any handcrafted features. We show, using a data set of 130000 files, that our approach maintains a high detection rate (96%) of PDF malware and even detects new malicious files, still undetected by most antiviruses. Using automatically generated features from our CNN network, and applying a clustering algorithm, we also obtain high similarity between the antiviruses’ labels and the resulting clusters. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJeW-A4tDS |
https://openreview.net/pdf?id=SJeW-A4tDS | |
PWC | https://paperswithcode.com/paper/detecting-malicious-pdf-using-cnn |
Repo | |
Framework | |
DeepV2D: Video to Depth with Differentiable Structure from Motion
Title | DeepV2D: Video to Depth with Differentiable Structure from Motion |
Authors | Anonymous |
Abstract | We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. |
Tasks | Depth Estimation, Motion Estimation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeO7RNKPr |
https://openreview.net/pdf?id=HJeO7RNKPr | |
PWC | https://paperswithcode.com/paper/deepv2d-video-to-depth-with-differentiable-1 |
Repo | |
Framework | |
Unsupervised Data Augmentation for Consistency Training
Title | Unsupervised Data Augmentation for Consistency Training |
Authors | Anonymous |
Abstract | Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 2.7% with only 4,000 examples, nearly matching the performance of models trained on 50,000 labeled examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. |
Tasks | Data Augmentation, Text Classification, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByeL1R4FvS |
https://openreview.net/pdf?id=ByeL1R4FvS | |
PWC | https://paperswithcode.com/paper/unsupervised-data-augmentation-for |
Repo | |
Framework | |
A Learning-based Iterative Method for Solving Vehicle Routing Problems
Title | A Learning-based Iterative Method for Solving Vehicle Routing Problems |
Authors | Anonymous |
Abstract | This paper is concerned with solving combinatorial optimization problems, in particular, the capacitated vehicle routing problems (CVRP). Classical Operations Research (OR) algorithms such as LKH3 (Helsgaun, 2017) are extremely inefficient (e.g., 13 hours on CVRP of only size 100) and difficult to scale to larger-size problems. Machine learning based approaches have recently shown to be promising, partly because of their efficiency (once trained, they can perform solving within minutes or even seconds). However, there is still a considerable gap between the quality of a machine learned solution and what OR methods can offer (e.g., on CVRP-100, the best result of learned solutions is between 16.10-16.80, significantly worse than LKH3’s 15.65). In this paper, we present the first learning based approach for CVRP that is efficient in solving speed and at the same time outperforms OR methods. Starting with a random initial solution, our algorithm learns to iteratively refines the solution with an improvement operator, selected by a reinforcement learning based controller. The improvement operator is selected from a pool of powerful operators that are customized for routing problems. By combining the strengths of the two worlds, our approach achieves the new state-of-the-art results on CVRP, e.g., an average cost of 15.57 on CVRP-100. |
Tasks | Combinatorial Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJe1334YDH |
https://openreview.net/pdf?id=BJe1334YDH | |
PWC | https://paperswithcode.com/paper/a-learning-based-iterative-method-for-solving |
Repo | |
Framework | |
Improved Training Techniques for Online Neural Machine Translation
Title | Improved Training Techniques for Online Neural Machine Translation |
Authors | Anonymous |
Abstract | Neural sequence-to-sequence models are at the basis of state-of-the-art solutions for sequential prediction problems such as machine translation and speech recognition. The models typically assume that the entire input is available when starting target generation. In some applications, however, it is desirable to start the decoding process before the entire input is available, e.g. to reduce the latency in automatic speech recognition. We consider state-of-the-art wait-k decoders, that first read k tokens from the source and then alternate between reading tokens from the input and writing to the output. We investigate the sensitivity of such models to the value of k that is used during training and when deploying the model, and the effect of updating the hidden states in transformer models as new source tokens are read. We experiment with German-English translation on the IWSLT14 dataset and the larger WMT15 dataset. Our results significantly improve over earlier state-of-the-art results for German-English translation on the WMT15 dataset across different latency levels. |
Tasks | Machine Translation, Speech Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rke3OxSKwr |
https://openreview.net/pdf?id=rke3OxSKwr | |
PWC | https://paperswithcode.com/paper/improved-training-techniques-for-online |
Repo | |
Framework | |
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Title | Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps |
Authors | Anonymous |
Abstract | Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5%. Learnable K-matrices can also simplify hand-engineered pipelines—we replace filter bank feature computation in speech data preprocessing with a kaleidoscope layer, resulting in only 0.4% loss in accuracy on the TIMIT speech recognition task. K-matrices can also capture latent structure in models: for a challenging permuted image classification task, adding a K-matrix to a standard convolutional architecture can enable learning the latent permutation and improve accuracy by over 8 points. We provide a practically efficient implementation of our approach, and use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task. |
Tasks | Image Classification, Speech Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkgrBgSYDS |
https://openreview.net/pdf?id=BkgrBgSYDS | |
PWC | https://paperswithcode.com/paper/kaleidoscope-an-efficient-learnable |
Repo | |
Framework | |
Top-down training for neural networks
Title | Top-down training for neural networks |
Authors | Anonymous |
Abstract | Vanishing gradients pose a challenge when training deep neural networks, resulting in the top layers (closer to the output) in the network learning faster when compared with lower layers closer to the input. Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor. This can lead to the feature extractor being under-trained, possibly failing to learn much about the patterns in the input data. To address this we propose a good classifier hypothesis: given a fixed classifier that partitions the space well, the feature extractor can be further trained to fit that classifier and learn the data patterns well. This alleviates the problem of under-training the feature extractor and enables the network to learn patterns in the data with small partial derivatives. We verify this hypothesis empirically and propose a novel top-down training method. We train all layers jointly, obtaining a good classifier from the top layers, which are then frozen. Following re-initialization, we retrain the bottom layers with respect to the frozen classifier. Applying this approach to a set of speech recognition experiments using the Wall Street Journal and noisy CHiME-4 datasets we observe substantial accuracy gains. When combined with dropout, our method enables connectionist temporal classification (CTC) models to outperform joint CTC-attention models, which have more capacity and flexibility. |
Tasks | Speech Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJg8NertPr |
https://openreview.net/pdf?id=rJg8NertPr | |
PWC | https://paperswithcode.com/paper/top-down-training-for-neural-networks |
Repo | |
Framework | |
Unsupervised Learning of Efficient and Robust Speech Representations
Title | Unsupervised Learning of Efficient and Robust Speech Representations |
Authors | Anonymous |
Abstract | We present an unsupervised method for learning speech representations based on a bidirectional contrastive predictive coding that implicitly discovers phonetic structure from large-scale corpora of unlabelled raw audio signals. The representations, which we learn from up to 8000 hours of publicly accessible speech data, are evaluated by looking at their impact on the behaviour of supervised speech recognition systems. First, across a variety of datasets, we find that the features learned from the largest and most diverse pretraining dataset result in significant improvements over standard audio features as well as over features learned from smaller amounts of pretraining data. Second, they significantly improve sample efficiency in low-data scenarios. Finally, the features confer significant robustness advantages to the resulting recognition systems: we see significant improvements in out-of-domain transfer relative to baseline feature sets, and the features likewise provide improvements in four different low-resource African language datasets. |
Tasks | Speech Recognition |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJe-blSYvH |
https://openreview.net/pdf?id=HJe-blSYvH | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-efficient-and-robust |
Repo | |
Framework | |
Quantum Algorithms for Deep Convolutional Neural Networks
Title | Quantum Algorithms for Deep Convolutional Neural Networks |
Authors | Iordanis Kerenidis, Jonas Landman, Anupam Prakash |
Abstract | Quantum computing is a powerful computational paradigm with applications in several fields, including machine learning. In the last decade, deep learning, and in particular Convolutional Neural Networks (CNN), have become essential for applications in signal processing and image recognition. Quantum deep learning, however, remains a challenging problem, as it is difficult to implement non linearities with quantum unitaries. In this paper we propose a quantum algorithm for evaluating and training deep convolutional neural networks with potential speedups over classical CNNs for both the forward and backward passes. The quantum CNN (QCNN) reproduces completely the outputs of the classical CNN and allows for non linearities and pooling operations. The QCNN is in particular interesting for deep networks and could allow new frontiers in the image recognition domain, by allowing for many more convolution kernels, larger kernels, high dimensional inputs and high depth input channels. We also present numerical simulations for the classification of the MNIST dataset to provide practical evidence for the efficiency of the QCNN. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hygab1rKDS |
https://openreview.net/pdf?id=Hygab1rKDS | |
PWC | https://paperswithcode.com/paper/quantum-algorithms-for-deep-convolutional |
Repo | |
Framework | |
DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS
Title | DYNAMIC SELF-TRAINING FRAMEWORK FOR GRAPH CONVOLUTIONAL NETWORKS |
Authors | Anonymous |
Abstract | Graph neural networks (GNN) such as GCN, GAT, MoNet have achieved state-of-the-art results on semi-supervised learning on graphs. However, when the number of labeled nodes is very small, the performances of GNNs downgrade dramatically. Self-training has proved to be effective for resolving this issue, however, the performance of self-trained GCN is still inferior to that of G2G and DGI for many settings. Moreover, additional model complexity make it more difficult to tune the hyper-parameters and do model selection. We argue that the power of self-training is still not fully explored for the node classification task. In this paper, we propose a unified end-to-end self-training framework called \emph{Dynamic Self-traning}, which generalizes and simplifies prior work. A simple instantiation of the framework based on GCN is provided and empirical results show that our framework outperforms all previous methods including GNNs, embedding based method and self-trained GCNs by a noticeable margin. Moreover, compared with standard self-training, hyper-parameter tuning for our framework is easier. |
Tasks | Model Selection, Node Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgCEpVtvr |
https://openreview.net/pdf?id=SJgCEpVtvr | |
PWC | https://paperswithcode.com/paper/dynamic-self-training-framework-for-graph-1 |
Repo | |
Framework | |
Improved Training of Certifiably Robust Models
Title | Improved Training of Certifiably Robust Models |
Authors | Anonymous |
Abstract | Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical (PGD) robustness. In principle, relaxation can provide tight bounds if the convex relaxation solution is feasible for the original non-relaxed problem. Therefore, we propose two regularizers that can be used to train neural networks that yield convex relaxations with tighter bounds. In all of our experiments, the proposed regularizations result in tighter certification bounds than non-regularized baselines. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygqFlBtPS |
https://openreview.net/pdf?id=HygqFlBtPS | |
PWC | https://paperswithcode.com/paper/improved-training-of-certifiably-robust |
Repo | |
Framework | |
Uncertainty-Aware Prediction for Graph Neural Networks
Title | Uncertainty-Aware Prediction for Graph Neural Networks |
Authors | Anonymous |
Abstract | Thanks to graph neural networks (GNNs), semi-supervised node classification has shown the state-of-the-art performance in graph data. However, GNNs do not consider any types of uncertainties associated with the class probabilities to minimize risk due to misclassification under uncertainty in real life. In this work, we propose a Bayesian deep learning framework reflecting various types of uncertainties for classification predictions by leveraging the powerful modeling and learning capabilities of GNNs. We considered multiple uncertainty types in both deep learning (DL) and belief/evidence theory domains. We treat the predictions of a Bayesian GNN (BGNN) as nodes’ multinomial subjective opinions in a graph based on Dirichlet distributions where each belief mass is a belief probability of each class. By collecting evidence from the given labels of training nodes, the BGNN model is designed for accurately predicting probabilities of each class and detecting out-of-distribution. We validated the outperformance of the proposed BGNN, compared to the state-of-the-art counterparts in terms of the accuracy of node classification prediction and out-of-distribution detection based on six real network datasets. |
Tasks | Node Classification, Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyxdC6NKwH |
https://openreview.net/pdf?id=SyxdC6NKwH | |
PWC | https://paperswithcode.com/paper/uncertainty-aware-prediction-for-graph-neural |
Repo | |
Framework | |
Deep Lifetime Clustering
Title | Deep Lifetime Clustering |
Authors | Anonymous |
Abstract | The goal of lifetime clustering is to develop an inductive model that maps subjects into $K$ clusters according to their underlying (unobserved) lifetime distribution. We introduce a neural-network based lifetime clustering model that can find cluster assignments by directly maximizing the divergence between the empirical lifetime distributions of the clusters. Accordingly, we define a novel clustering loss function over the lifetime distributions (of entire clusters) based on a tight upper bound of the two-sample Kuiper test p-value. The resultant model is robust to the modeling issues associated with the unobservability of termination signals, and does not assume proportional hazards. Our results in real and synthetic datasets show significantly better lifetime clusters (as evaluated by C-index, Brier Score, Logrank score and adjusted Rand index) as compared to competing approaches. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeYUkStPr |
https://openreview.net/pdf?id=SkeYUkStPr | |
PWC | https://paperswithcode.com/paper/deep-lifetime-clustering |
Repo | |
Framework | |