October 20, 2019

2748 words 13 mins read

Paper Group AWR 334

Quadrature-based features for kernel approximation. Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters. Recurrent machines for likelihood-free inference. A minimax near-optimal algorithm for adaptive rejection sampling. Deep Frank-Wolfe For Neural Network Optimization. Multistep Neural Networks for Data-driven …

Quadrature-based features for kernel approximation


Title	Quadrature-based features for kernel approximation
Authors	Marina Munkhoeva, Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets
Abstract	We consider the problem of improving kernel approximation via randomized feature maps. These maps arise as Monte Carlo approximation to integral representations of kernel functions and scale up kernel methods for larger datasets. Based on an efficient numerical integration technique, we propose a unifying approach that reinterprets the previous random features methods and extends to better estimates of the kernel approximation. We derive the convergence behaviour and conduct an extensive empirical study that supports our hypothesis.
Tasks
Published	2018-02-11
URL	http://arxiv.org/abs/1802.03832v4
PDF	http://arxiv.org/pdf/1802.03832v4.pdf
PWC	https://paperswithcode.com/paper/quadrature-based-features-for-kernel
Repo	https://github.com/quffka/quffka
Framework	none

Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters


Title	Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters
Authors	Anil Goyal, Emilie Morvant, Pascal Germain, Massih-Reza Amini
Abstract	In this paper we propose a boosting based multiview learning algorithm, referred to as PB-MVBoost, which iteratively learns i) weights over view-specific voters capturing view-specific information; and ii) weights over views by optimizing a PAC-Bayes multiview C-Bound that takes into account the accuracy of view-specific classifiers and the diversity between the views. We derive a generalization bound for this strategy following the PAC-Bayes theory which is a suitable tool to deal with models expressed as weighted combination over a set of voters. Different experiments on three publicly available datasets show the efficiency of the proposed approach with respect to state-of-art models.
Tasks	Document Classification, Multilingual text classification, Multiview Learning, Text Classification
Published	2018-08-17
URL	http://arxiv.org/abs/1808.05784v2
PDF	http://arxiv.org/pdf/1808.05784v2.pdf
PWC	https://paperswithcode.com/paper/multiview-boosting-by-controlling-the
Repo	https://github.com/goyalanil/Multiview_Dataset_MNIST
Framework	none

Recurrent machines for likelihood-free inference


Title	Recurrent machines for likelihood-free inference
Authors	Arthur Pesah, Antoine Wehenkel, Gilles Louppe
Abstract	Likelihood-free inference is concerned with the estimation of the parameters of a non-differentiable stochastic simulator that best reproduce real observations. In the absence of a likelihood function, most of the existing inference methods optimize the simulator parameters through a handcrafted iterative procedure that tries to make the simulated data more similar to the observations. In this work, we explore whether meta-learning can be used in the likelihood-free context, for learning automatically from data an iterative optimization procedure that would solve likelihood-free inference problems. We design a recurrent inference machine that learns a sequence of parameter updates leading to good parameter estimates, without ever specifying some explicit notion of divergence between the simulated data and the real data distributions. We demonstrate our approach on toy simulators, showing promising results both in terms of performance and robustness.
Tasks	Meta-Learning
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12932v2
PDF	http://arxiv.org/pdf/1811.12932v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-machines-for-likelihood-free
Repo	https://github.com/artix41/ALFI-pytorch
Framework	pytorch

A minimax near-optimal algorithm for adaptive rejection sampling


Title	A minimax near-optimal algorithm for adaptive rejection sampling
Authors	Juliette Achdou, Joseph C. Lam, Alexandra Carpentier, Gilles Blanchard
Abstract	Rejection Sampling is a fundamental Monte-Carlo method. It is used to sample from distributions admitting a probability density function which can be evaluated exactly at any given point, albeit at a high computational cost. However, without proper tuning, this technique implies a high rejection rate. Several methods have been explored to cope with this problem, based on the principle of adaptively estimating the density by a simpler function, using the information of the previous samples. Most of them either rely on strong assumptions on the form of the density, or do not offer any theoretical performance guarantee. We give the first theoretical lower bound for the problem of adaptive rejection sampling and introduce a new algorithm which guarantees a near-optimal rejection rate in a minimax sense.
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09390v1
PDF	http://arxiv.org/pdf/1810.09390v1.pdf
PWC	https://paperswithcode.com/paper/a-minimax-near-optimal-algorithm-for-adaptive
Repo	https://github.com/josephclam/NNARS
Framework	none

Deep Frank-Wolfe For Neural Network Optimization


Title	Deep Frank-Wolfe For Neural Network Optimization
Authors	Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
Abstract	Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In addition, its adaptive variants tend to produce solutions that generalize less well on unseen data than SGD with a hand-designed schedule. We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter. Our approach is based on a composite proximal framework, which exploits the compositional nature of deep neural networks and can leverage powerful convex optimization algorithms by design. Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes an optimal step-size in closed-form at each time-step. We further show that the descent direction is given by a simple backward pass in the network, yielding the same computational cost per iteration as SGD. We present experiments on the CIFAR and SNLI data sets, where we demonstrate the significant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed learning rate schedule, and show that it provides similar generalization while converging faster. The code is publicly available at https://github.com/oval-group/dfw.
Tasks
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07591v2
PDF	http://arxiv.org/pdf/1811.07591v2.pdf
PWC	https://paperswithcode.com/paper/deep-frank-wolfe-for-neural-network
Repo	https://github.com/oval-group/dfw
Framework	pytorch

Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems


Title	Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems
Authors	Maziar Raissi, Paris Perdikaris, George Em Karniadakis
Abstract	The process of transforming observed data into predictive mathematical models of the physical world has always been paramount in science and engineering. Although data is currently being collected at an ever-increasing pace, devising meaningful models out of such observations in an automated fashion still remains an open problem. In this work, we put forth a machine learning approach for identifying nonlinear dynamical systems from data. Specifically, we blend classical tools from numerical analysis, namely the multi-step time-stepping schemes, with powerful nonlinear function approximators, namely deep neural networks, to distill the mechanisms that govern the evolution of a given data-set. We test the effectiveness of our approach for several benchmark problems involving the identification of complex, nonlinear and chaotic dynamics, and we demonstrate how this allows us to accurately learn the dynamics, forecast future states, and identify basins of attraction. In particular, we study the Lorenz system, the fluid flow behind a cylinder, the Hopf bifurcation, and the Glycoltic oscillator model as an example of complicated nonlinear dynamics typical of biological systems.
Tasks
Published	2018-01-04
URL	http://arxiv.org/abs/1801.01236v1
PDF	http://arxiv.org/pdf/1801.01236v1.pdf
PWC	https://paperswithcode.com/paper/multistep-neural-networks-for-data-driven
Repo	https://github.com/maziarraissi/MultistepNNs
Framework	none

Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs


Title	Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
Authors	Xuhao Chen
Abstract	Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.43x and 1.69x, compared to CUBLAS and CUSPARSE respectively.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10280v2
PDF	http://arxiv.org/pdf/1802.10280v2.pdf
PWC	https://paperswithcode.com/paper/escort-efficient-sparse-convolutional-neural
Repo	https://github.com/gkUwen/learning-material
Framework	none

The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities


Title	The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities
Authors	Marco Del Tredici, Raquel Fernández
Abstract	We investigate the birth and diffusion of lexical innovations in a large dataset of online social communities. We build on sociolinguistic theories and focus on the relation between the spread of a novel term and the social role of the individuals who use it, uncovering characteristics of innovators and adopters. Finally, we perform a prediction task that allows us to anticipate whether an innovation will successfully spread within a community.
Tasks
Published	2018-06-15
URL	http://arxiv.org/abs/1806.05838v1
PDF	http://arxiv.org/pdf/1806.05838v1.pdf
PWC	https://paperswithcode.com/paper/the-road-to-success-assessing-the-fate-of
Repo	https://github.com/marcodel13/The-Road-to-Success
Framework	none

Hateminers : Detecting Hate speech against Women


Title	Hateminers : Detecting Hate speech against Women
Authors	Punyajoy Saha, Binny Mathew, Pawan Goyal, Animesh Mukherjee
Abstract	With the online proliferation of hate speech, there is an urgent need for systems that can detect such harmful content. In this paper, We present the machine learning models developed for the Automatic Misogyny Identification (AMI) shared task at EVALITA 2018. We generate three types of features: Sentence Embeddings, TF-IDF Vectors, and BOW Vectors to represent each tweet. These features are then concatenated and fed into the machine learning models. Our model came First for the English Subtask A and Fifth for the English Subtask B. We release our winning model for public use and it’s available at https://github.com/punyajoy/Hateminers-EVALITA.
Tasks	Hate Speech Detection, Sentence Embeddings
Published	2018-12-17
URL	http://arxiv.org/abs/1812.06700v1
PDF	http://arxiv.org/pdf/1812.06700v1.pdf
PWC	https://paperswithcode.com/paper/hateminers-detecting-hate-speech-against
Repo	https://github.com/punyajoy/Hateminers-EVALITA
Framework	none

Reduced-order modeling with artificial neurons for gravitational-wave inference


Title	Reduced-order modeling with artificial neurons for gravitational-wave inference
Authors	Alvin J. K. Chua, Chad R. Galley, Michele Vallisneri
Abstract	Gravitational-wave data analysis is rapidly absorbing techniques from deep learning, with a focus on convolutional networks and related methods that treat noisy time series as images. We pursue an alternative approach, in which waveforms are first represented as weighted sums over reduced bases (reduced-order modeling); we then train artificial neural networks to map gravitational-wave source parameters into basis coefficients. Statistical inference proceeds directly in coefficient space, where it is theoretically straightforward and computationally efficient. The neural networks also provide analytic waveform derivatives, which are useful for gradient-based sampling schemes. We demonstrate fast and accurate coefficient interpolation for the case of a four-dimensional binary-inspiral waveform family, and discuss promising applications of our framework in parameter estimation.
Tasks	Time Series
Published	2018-11-13
URL	https://arxiv.org/abs/1811.05491v2
PDF	https://arxiv.org/pdf/1811.05491v2.pdf
PWC	https://paperswithcode.com/paper/roman-reduced-order-modeling-with-artificial
Repo	https://github.com/vallis/truebayes
Framework	pytorch

GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression


Title	GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression
Authors	Shubhomoy Das, Janardhan Rao Doppa
Abstract	We propose an algorithm called GLAD (GLocalized Anomaly Detection) that allows end-users to retain the use of simple and understandable global anomaly detectors by automatically learning their local relevance to specific data instances using label feedback. The key idea is to place a uniform prior on the relevance of each member of the anomaly detection ensemble over the input feature space via a neural network trained on unlabeled instances, and tune the weights of the neural network to adjust the local relevance of each ensemble member using all labeled instances. Our experiments on synthetic and real-world data show the effectiveness of GLAD in learning the local relevance of ensemble members and discovering anomalies via label feedback.
Tasks	Anomaly Detection
Published	2018-10-02
URL	http://arxiv.org/abs/1810.01403v3
PDF	http://arxiv.org/pdf/1810.01403v3.pdf
PWC	https://paperswithcode.com/paper/glad-glocalized-anomaly-detection-via-active
Repo	https://github.com/freedombenLiu/ad_examples
Framework	tf

Seglearn: A Python Package for Learning Sequences and Time Series


Title	Seglearn: A Python Package for Learning Sequences and Time Series
Authors	David M. Burns, Cari M. Whyne
Abstract	Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The package depends on numpy, scipy, and scikit-learn. Seglearn is distributed under the BSD 3-Clause License. Documentation includes a detailed API description, user guide, and examples. Unit tests provide a high degree of code coverage.
Tasks	Time Series
Published	2018-03-21
URL	http://arxiv.org/abs/1803.08118v3
PDF	http://arxiv.org/pdf/1803.08118v3.pdf
PWC	https://paperswithcode.com/paper/seglearn-a-python-package-for-learning
Repo	https://github.com/dmbee/seglearn
Framework	tf

Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems


Title	Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems
Authors	Anton Puzanov, Kobi Cohen
Abstract	In recent years there has been a sharp rise in networking applications, in which significant events need to be classified but only a few training instances are available. These are known as cases of one-shot learning. Examples include analyzing network traffic under zero-day attacks, and computer vision tasks by sensor networks deployed in the field. To handle this challenging task, organizations often use human analysts to classify events under high uncertainty. Existing algorithms use a threshold-based mechanism to decide whether to classify an object automatically or send it to an analyst for deeper inspection. However, this approach leads to a significant waste of resources since it does not take the practical temporal constraints of system resources into account. Our contribution is threefold. First, we develop a novel Deep Reinforcement One-shot Learning (DeROL) framework to address this challenge. The basic idea of the DeROL algorithm is to train a deep-Q network to obtain a policy which is oblivious to the unseen classes in the testing data. Then, in real-time, DeROL maps the current state of the one-shot learning process to operational actions based on the trained deep-Q network, to maximize the objective function. Second, we develop the first open-source software for practical artificially intelligent one-shot classification systems with limited resources for the benefit of researchers in related fields. Third, we present an extensive experimental study using the OMNIGLOT dataset for computer vision tasks and the UNSW-NB15 dataset for intrusion detection tasks that demonstrates the versatility and efficiency of the DeROL framework.
Tasks	Intrusion Detection, Omniglot, One-Shot Learning
Published	2018-08-04
URL	http://arxiv.org/abs/1808.01527v1
PDF	http://arxiv.org/pdf/1808.01527v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-one-shot-learning-for
Repo	https://github.com/antonpuz/DeROL
Framework	tf

Set Aggregation Network as a Trainable Pooling Layer


Title	Set Aggregation Network as a Trainable Pooling Layer
Authors	Łukasz Maziarka, Marek Śmieja, Aleksandra Nowak, Jacek Tabor, Łukasz Struski, Przemysław Spurek
Abstract	Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer.
Tasks
Published	2018-10-03
URL	https://arxiv.org/abs/1810.01868v3
PDF	https://arxiv.org/pdf/1810.01868v3.pdf
PWC	https://paperswithcode.com/paper/set-aggregation-network-for-structured-data
Repo	https://github.com/gmum/set-aggregation
Framework	tf

Low Cost Edge Sensing for High Quality Demosaicking


Title	Low Cost Edge Sensing for High Quality Demosaicking
Authors	Yan Niu, Jihong Ouyang, Wanli Zuo, Fuxin Wang
Abstract	Digital cameras that use Color Filter Arrays (CFA) entail a demosaicking procedure to form full RGB images. As today’s camera users generally require images to be viewed instantly, demosaicking algorithms for real applications must be fast. Moreover, the associated cost should be lower than the cost saved by using CFA. For this purpose, we revisit the classical Hamilton-Adams (HA) algorithm, which outperforms many sophisticated techniques in both speed and accuracy. Inspired by HA’s strength and weakness, we design a very low cost edge sensing scheme. Briefly, it guides demosaicking by a logistic functional of the difference between directional variations. We extensively compare our algorithm with 28 demosaicking algorithms by running their open source codes on benchmark datasets. Compared to methods of similar computational cost, our method achieves substantially higher accuracy, Whereas compared to methods of similar accuracy, our method has significantly lower cost. Moreover, on test images of currently popular resolution, the quality of our algorithm is comparable to top performers, whereas its speed is tens of times faster.
Tasks	Demosaicking
Published	2018-06-03
URL	http://arxiv.org/abs/1806.00771v2
PDF	http://arxiv.org/pdf/1806.00771v2.pdf
PWC	https://paperswithcode.com/paper/low-cost-edge-sensing-for-high-quality
Repo	https://github.com/shmilyo/Low-Cost-Edge-Sensing-for-High-Quality-Demosaicking
Framework	none