April 1, 2020

3298 words 16 mins read

Paper Group ANR 417

Learning Gaussian Graphical Models via Multiplicative Weights. Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds. Bayesian Deep Learning and a Probabilistic Perspective of Generalization. SpotTheFake: An Initial Report on a New CNN-Enhanced Platform for Counterfeit Goods Detection. Pairwise N …

Learning Gaussian Graphical Models via Multiplicative Weights


Title	Learning Gaussian Graphical Models via Multiplicative Weights
Authors	Anamay Chaturvedi, Jonathan Scarlett
Abstract	Graphical model selection in Markov random fields is a fundamental problem in statistics and machine learning. Two particularly prominent models, the Ising model and Gaussian model, have largely developed in parallel using different (though often related) techniques, and several practical algorithms with rigorous sample complexity bounds have been established for each. In this paper, we adapt a recently proposed algorithm of Klivans and Meka (FOCS, 2017), based on the method of multiplicative weight updates, from the Ising model to the Gaussian model, via non-trivial modifications to both the algorithm and its analysis. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature, has a low runtime $O(mp^2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
Tasks	Model Selection
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08663v2
PDF	https://arxiv.org/pdf/2002.08663v2.pdf
PWC	https://paperswithcode.com/paper/learning-gaussian-graphical-models-via
Repo
Framework

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds


Title	Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
Authors	Ehsan Emamjomeh-Zadeh, Chen-Yu Wei, Haipeng Luo, David Kempe
Abstract	We revisit the problem of online learning with sleeping experts/bandits: in each time step, only a subset of the actions are available for the algorithm to choose from (and learn about). The work of Kleinberg et al. [2010] showed that there exist no-regret algorithms which perform no worse than the best ranking of actions asymptotically. Unfortunately, achieving this regret bound appears computationally hard: Kanade and Steinke [2014] showed that achieving this no-regret performance is at least as hard as PAC-learning DNFs, a notoriously difficult problem. In the present work, we relax the original problem and study computationally efficient no-approximate-regret algorithms: such algorithms may exceed the optimal cost by a multiplicative constant in addition to the additive regret. We give an algorithm that provides a no-approximate-regret guarantee for the general sleeping expert/bandit problems. For several canonical special cases of the problem, we give algorithms with significantly better approximation ratios; these algorithms also illustrate different techniques for achieving no-approximate-regret guarantees.
Tasks
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03490v1
PDF	https://arxiv.org/pdf/2003.03490v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-online-learning-with-changing
Repo
Framework

Bayesian Deep Learning and a Probabilistic Perspective of Generalization


Title	Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Authors	Andrew Gordon Wilson, Pavel Izmailov
Abstract	The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.
Tasks	Calibration, Gaussian Processes
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08791v2
PDF	https://arxiv.org/pdf/2002.08791v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-deep-learning-and-a-probabilistic
Repo
Framework

SpotTheFake: An Initial Report on a New CNN-Enhanced Platform for Counterfeit Goods Detection


Title	SpotTheFake: An Initial Report on a New CNN-Enhanced Platform for Counterfeit Goods Detection
Authors	Alexandru Şerban, George Ilaş, George-Cosmin Poruşniuc
Abstract	The counterfeit goods trade represents nowadays more than 3.3% of the whole world trade and thus it’s a problem that needs now more than ever a lot of attention and a reliable solution that would reduce the negative impact it has over the modern society. This paper presents the design and early stage development of a novel counterfeit goods detection platform that makes use of the outstsanding learning capabilities of the classical VGG16 convolutional model trained through the process of “transfer learning” and a multi-stage fake detection procedure that proved to be not only reliable but also very robust in the experiments we have conducted so far using an image dataset of various goods which we gathered ourselves.
Tasks	Transfer Learning
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06735v2
PDF	https://arxiv.org/pdf/2002.06735v2.pdf
PWC	https://paperswithcode.com/paper/spotthefake-an-initial-report-on-a-new-cnn
Repo
Framework

Pairwise Neural Networks (PairNets) with Low Memory for Fast On-Device Applications


Title	Pairwise Neural Networks (PairNets) with Low Memory for Fast On-Device Applications
Authors	Luna M. Zhang
Abstract	A traditional artificial neural network (ANN) is normally trained slowly by a gradient descent algorithm, such as the backpropagation algorithm, since a large number of hyperparameters of the ANN need to be fine-tuned with many training epochs. Since a large number of hyperparameters of a deep neural network, such as a convolutional neural network, occupy much memory, a memory-inefficient deep learning model is not ideal for real-time Internet of Things (IoT) applications on various devices, such as mobile phones. Thus, it is necessary to develop fast and memory-efficient Artificial Intelligence of Things (AIoT) systems for real-time on-device applications. We created a novel wide and shallow 4-layer ANN called “Pairwise Neural Network” (“PairNet”) with high-speed non-gradient-descent hyperparameter optimization. The PairNet is trained quickly with only one epoch since its hyperparameters are directly optimized one-time via simply solving a system of linear equations by using the multivariate least squares fitting method. In addition, an n-input space is partitioned into many n-input data subspaces, and a local PairNet is built in a local n-input subspace. This divide-and-conquer approach can train the local PairNet using specific local features to improve model performance. Simulation results indicate that the three PairNets with incremental learning have smaller average prediction mean squared errors, and achieve much higher speeds than traditional ANNs. An important future work is to develop better and faster non-gradient-descent hyperparameter optimization algorithms to generate effective, fast, and memory-efficient PairNets with incremental learning on optimal subspaces for real-time AIoT on-device applications.
Tasks	Hyperparameter Optimization
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04458v1
PDF	https://arxiv.org/pdf/2002.04458v1.pdf
PWC	https://paperswithcode.com/paper/pairwise-neural-networks-pairnets-with-low
Repo
Framework

Optimization of Convolutional Neural Network Using the Linearly Decreasing Weight Particle Swarm Optimization


Title	Optimization of Convolutional Neural Network Using the Linearly Decreasing Weight Particle Swarm Optimization
Authors	T. Serizawa, H. Fujita
Abstract	Convolutional neural network (CNN) is one of the most frequently used deep learning techniques. Various forms of models have been proposed and improved for learning at CNN. When learning with CNN, it is necessary to determine the optimal hyperparameters. However, the number of hyperparameters is so large that it is difficult to do it manually, so much research has been done on automation. A method that uses metaheuristic algorithms is attracting attention in research on hyperparameter optimization. Metaheuristic algorithms are naturally inspired and include evolution strategies, genetic algorithms, antcolony optimization and particle swarm optimization. In particular, particle swarm optimization converges faster than genetic algorithms, and various models have been proposed. In this paper, we propose CNN hyperparameter optimization with linearly decreasing weight particle swarm optimization (LDWPSO). In the experiment, the MNIST data set and CIFAR-10 data set, which are often used as benchmark data sets, are used. By optimizing CNN hyperparameters with LDWPSO, learning the MNIST and CIFAR-10 datasets, we compare the accuracy with a standard CNN based on LeNet-5. As a result, when using the MNIST dataset, the baseline CNN is 94.02% at the 5th epoch, compared to 98.95% for LDWPSO CNN, which improves accuracy. When using the CIFAR-10 dataset, the Baseline CNN is 28.07% at the 10th epoch, compared to 69.37% for the LDWPSO CNN, which greatly improves accuracy.
Tasks	Hyperparameter Optimization
Published	2020-01-16
URL	https://arxiv.org/abs/2001.05670v1
PDF	https://arxiv.org/pdf/2001.05670v1.pdf
PWC	https://paperswithcode.com/paper/optimization-of-convolutional-neural-network-1
Repo
Framework

Minimax Defense against Gradient-based Adversarial Attacks


Title	Minimax Defense against Gradient-based Adversarial Attacks
Authors	Blerta Lindqvist, Rauf Izmailov
Abstract	State-of-the-art adversarial attacks are aimed at neural network classifiers. By default, neural networks use gradient descent to minimize their loss function. The gradient of a classifier’s loss function is used by gradient-based adversarial attacks to generate adversarially perturbed images. We pose the question whether another type of optimization could give neural network classifiers an edge. Here, we introduce a novel approach that uses minimax optimization to foil gradient-based adversarial attacks. Our minimax classifier is the discriminator of a generative adversarial network (GAN) that plays a minimax game with the GAN generator. In addition, our GAN generator projects all points onto a manifold that is different from the original manifold since the original manifold might be the cause of adversarial attacks. To measure the performance of our minimax defense, we use adversarial attacks - Carlini Wagner (CW), DeepFool, Fast Gradient Sign Method (FGSM) - on three datasets: MNIST, CIFAR-10 and German Traffic Sign (TRAFFIC). Against CW attacks, our minimax defense achieves 98.07% (MNIST-default 98.93%), 73.90% (CIFAR-10-default 83.14%) and 94.54% (TRAFFIC-default 96.97%). Against DeepFool attacks, our minimax defense achieves 98.87% (MNIST), 76.61% (CIFAR-10) and 94.57% (TRAFFIC). Against FGSM attacks, we achieve 97.01% (MNIST), 76.79% (CIFAR-10) and 81.41% (TRAFFIC). Our Minimax adversarial approach presents a significant shift in defense strategy for neural network classifiers.
Tasks
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01256v1
PDF	https://arxiv.org/pdf/2002.01256v1.pdf
PWC	https://paperswithcode.com/paper/minimax-defense-against-gradient-based
Repo
Framework

Physical Model Guided Deep Image Deraining


Title	Physical Model Guided Deep Image Deraining
Authors	Honghe Zhu, Cong Wang, Yajie Zhang, Zhixun Su, Guohui Zhao
Abstract	Single image deraining is an urgent task because the degraded rainy image makes many computer vision systems fail to work, such as video surveillance and autonomous driving. So, deraining becomes important and an effective deraining algorithm is needed. In this paper, we propose a novel network based on physical model guided learning for single image deraining, which consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The concatenation of rain streaks and rain-free image that are estimated by rain streaks network, rain-free network, respectively, is input to the guide-learning network to guide further learning and the direct sum of the two estimated images is constrained with the input rainy image based on the physical model of rainy image. Moreover, we further develop the Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and it is proved to boost the deraining performance. Quantitative and qualitative experimental results demonstrate that the proposed method outperforms the state-of-the-art deraining methods. The source code will be available at \url{https://supercong94.wixsite.com/supercong94}.
Tasks	Autonomous Driving, Rain Removal, Single Image Deraining
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13242v1
PDF	https://arxiv.org/pdf/2003.13242v1.pdf
PWC	https://paperswithcode.com/paper/physical-model-guided-deep-image-deraining
Repo
Framework

Semi-DerainGAN: A New Semi-supervised Single Image Deraining Network


Title	Semi-DerainGAN: A New Semi-supervised Single Image Deraining Network
Authors	Yanyan Wei, Zhao Zhang, Haijun Zhang, Jie Qin, Mingbo Zhao
Abstract	Removing the rain streaks from single image is still a challenging task, since the shapes and direc-tions of rain streaks in the synthetic datasets are very different from real images. Although super-vised deep deraining networks have obtained im-pressive results on synthetic datasets, they still cannot obtain satisfactory results on real images due to weak generalization of rain removal capac-ity, i.e., the pre-trained models usually cannot handle new shapes and directions that may lead to over-derained/under-derained results. In this paper, we propose a new semi-supervised GAN-based deraining network termed Semi-DerainGAN, which can use both synthetic and real rainy images in a uniform network using two supervised and unsupervised processes. Specifically, a semi-supervised rain streak learner termed SSRML sharing the same parameters of both processes is derived, which makes the real images contribute more rain streak information. To deliver better deraining results, we design a paired discriminator for distinguishing the real pairs from fake pairs. Note that we also contribute a new real-world rainy image dataset Real200 to alleviate the dif-ference between the synthetic and real image do-mains. Extensive results on public datasets show that our model can obtain competitive perfor-mance, especially on real images.
Tasks	Rain Removal, Single Image Deraining
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08388v1
PDF	https://arxiv.org/pdf/2001.08388v1.pdf
PWC	https://paperswithcode.com/paper/semi-deraingan-a-new-semi-supervised-single
Repo
Framework

KALE: When Energy-Based Learning Meets Adversarial Training


Title	KALE: When Energy-Based Learning Meets Adversarial Training
Authors	Michael Arbel, Liang Zhou, Arthur Gretton
Abstract	Legendre duality provides a variational lower-bound for the Kullback-Leibler divergence (KL) which can be estimated using samples, without explicit knowledge of the density ratio. We use this estimator, the \textit{KL Approximate Lower-bound Estimate} (KALE), in a contrastive setting for learning energy-based models, and show that it provides a maximum likelihood estimate (MLE). We then extend this procedure to adversarial training, where the discriminator represents the energy and the generator is the base measure of the energy-based model. Unlike in standard generative adversarial networks (GANs), the learned model makes use of both generator and discriminator to generate samples. This is achieved using Hamiltonian Monte Carlo in the latent space of the generator, using information from the discriminator, to find regions in that space that produce better quality samples. We also show that, unlike the KL, KALE enjoys smoothness properties that make it suitable for adversarial training, and provide convergence rates for KALE when the negative log density ratio belongs to the variational family. Finally, we demonstrate the effectiveness of this approach on simple datasets.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.05033v1
PDF	https://arxiv.org/pdf/2003.05033v1.pdf
PWC	https://paperswithcode.com/paper/kale-when-energy-based-learning-meets
Repo
Framework


Title	Autonomous UAV Navigation: A DDPG-based Deep Reinforcement Learning Approach
Authors	Omar Bouhamed, Hakim Ghazzai, Hichem Besbes, Yehia Massoud
Abstract	In this paper, we propose an autonomous UAV path planning framework using deep reinforcement learning approach. The objective is to employ a self-trained UAV as a flying mobile unit to reach spatially distributed moving or static targets in a given three dimensional urban area. In this approach, a Deep Deterministic Policy Gradient (DDPG) with continuous action space is designed to train the UAV to navigate through or over the obstacles to reach its assigned target. A customized reward function is developed to minimize the distance separating the UAV and its destination while penalizing collisions. Numerical simulations investigate the behavior of the UAV in learning the environment and autonomously determining trajectories for different selected scenarios.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2003.10923v1
PDF	https://arxiv.org/pdf/2003.10923v1.pdf
PWC	https://paperswithcode.com/paper/autonomous-uav-navigation-a-ddpg-based-deep
Repo
Framework

Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading


Title	Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
Authors	Mingshuang Luo, Shuang Yang, Shiguang Shan, Xilin Chen
Abstract	Lip-reading aims to infer the speech content from the lip movement sequence and can be seen as a typical sequence-to-sequence (seq2seq) problem which translates the input image sequence of lip movements to the text sequence of the speech content. However, the traditional learning process of seq2seq models always suffers from two problems: the exposure bias resulted from the strategy of “teacher-forcing”, and the inconsistency between the discriminative optimization target (usually the cross-entropy loss) and the final evaluation metric (usually the character/word error rate). In this paper, we propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems. On the one hand, we introduce the evaluation metric (refers to the character error rate in this paper) as a form of reward to optimize the model together with the original discriminative target. On the other hand, inspired by the local perception property of convolutional operation, we perform a pseudo-convolutional operation on the reward and loss dimension, so as to take more context around each time step into account to generate a robust reward and loss for the whole optimization. Finally, we perform a thorough comparison and evaluation on both the word-level and sentence-level benchmarks. The results show a significant improvement over other related methods, and report either a new state-of-the-art performance or a competitive accuracy on all these challenging benchmarks, which clearly proves the advantages of our approach.
Tasks
Published	2020-03-09
URL	https://arxiv.org/abs/2003.03983v1
PDF	https://arxiv.org/pdf/2003.03983v1.pdf
PWC	https://paperswithcode.com/paper/pseudo-convolutional-policy-gradient-for
Repo
Framework

Many-Objective Estimation of Distribution Optimization Algorithm Based on WGAN-GP


Title	Many-Objective Estimation of Distribution Optimization Algorithm Based on WGAN-GP
Authors	Zhenyu Liang, Yunfan Li, Zhongwei Wan
Abstract	Estimation of distribution algorithms (EDA) are stochastic optimization algorithms. EDA establishes a probability model to describe the distribution of solution from the perspective of population macroscopically by statistical learning method, and then randomly samples the probability model to generate a new population. EDA can better solve multi-objective optimal problems (MOPs). However, the performance of EDA decreases in solving many-objective optimal problems (MaOPs), which contains more than three objectives. Reference Vector Guided Evolutionary Algorithm (RVEA), based on the EDA framework, can better solve MaOPs. In our paper, we use the framework of RVEA. However, we generate the new population by Wasserstein Generative Adversarial Networks-Gradient Penalty (WGAN-GP) instead of using crossover and mutation. WGAN-GP have advantages of fast convergence, good stability and high sample quality. WGAN-GP learn the mapping relationship from standard normal distribution to given data set distribution based on a given data set subject to the same distribution. It can quickly generate populations with high diversity and good convergence. To measure the performance, RM-MEDA, MOPSO and NSGA-II are selected to perform comparison experiments over DTLZ and LSMOP test suites with 3-, 5-, 8-, 10- and 15-objective.
Tasks	Stochastic Optimization
Published	2020-03-16
URL	https://arxiv.org/abs/2003.08295v1
PDF	https://arxiv.org/pdf/2003.08295v1.pdf
PWC	https://paperswithcode.com/paper/many-objective-estimation-of-distribution
Repo
Framework

DeepSurf: A surface-based deep learning approach for the prediction of ligand binding sites on proteins


Title	DeepSurf: A surface-based deep learning approach for the prediction of ligand binding sites on proteins
Authors	Stelios K. Mylonas, Apostolos Axenopoulos, Petros Daras
Abstract	The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3D voxelized grids are placed on the protein’s surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior performance on two diverse testing datasets, by surpassing all its main deep learning-based competitors.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05643v1
PDF	https://arxiv.org/pdf/2002.05643v1.pdf
PWC	https://paperswithcode.com/paper/deepsurf-a-surface-based-deep-learning
Repo
Framework

HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics


Title	HyperEmbed: Tradeoffs Between Resources and Performance in NLP Tasks with Hyperdimensional Computing enabled Embedding of n-gram Statistics
Authors	Pedro Alonso, Kumar Shridhar, Denis Kleyko, Evgeny Osipov, Marcus Liwicki
Abstract	Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient algorithms for NLP tasks. In particular, it investigates distributed representations of n-gram statistics of texts. The representations are formed using hyperdimensional computing enabled embedding. These representations then serve as features, which are used as input to standard classifiers. We investigate the applicability of the embedding on one large and three small standard datasets for classification tasks using nine classifiers. The embedding achieved on par F1 scores while decreasing the time and memory requirements by several times compared to the conventional n-gram statistics, e.g., for one of the classifiers on a small dataset, the memory reduction was 6.18 times; while train and test speed-ups were 4.62 and 3.84 times, respectively. For many classifiers on the large dataset, the memory reduction was about 100 times and train and test speed-ups were over 100 times. More importantly, the usage of distributed representations formed via hyperdimensional computing allows dissecting the strict dependency between the dimensionality of the representation and the parameters of n-gram statistics, thus, opening a room for tradeoffs.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01821v1
PDF	https://arxiv.org/pdf/2003.01821v1.pdf
PWC	https://paperswithcode.com/paper/hyperembed-tradeoffs-between-resources-and-1
Repo
Framework