Paper Group AWR 334
Quadrature-based features for kernel approximation. Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters. Recurrent machines for likelihood-free inference. A minimax near-optimal algorithm for adaptive rejection sampling. Deep Frank-Wolfe For Neural Network Optimization. Multistep Neural Networks for Data-driven …
Quadrature-based features for kernel approximation
Title | Quadrature-based features for kernel approximation |
Authors | Marina Munkhoeva, Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets |
Abstract | We consider the problem of improving kernel approximation via randomized feature maps. These maps arise as Monte Carlo approximation to integral representations of kernel functions and scale up kernel methods for larger datasets. Based on an efficient numerical integration technique, we propose a unifying approach that reinterprets the previous random features methods and extends to better estimates of the kernel approximation. We derive the convergence behaviour and conduct an extensive empirical study that supports our hypothesis. |
Tasks | |
Published | 2018-02-11 |
URL | http://arxiv.org/abs/1802.03832v4 |
http://arxiv.org/pdf/1802.03832v4.pdf | |
PWC | https://paperswithcode.com/paper/quadrature-based-features-for-kernel |
Repo | https://github.com/quffka/quffka |
Framework | none |
Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters
Title | Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters |
Authors | Anil Goyal, Emilie Morvant, Pascal Germain, Massih-Reza Amini |
Abstract | In this paper we propose a boosting based multiview learning algorithm, referred to as PB-MVBoost, which iteratively learns i) weights over view-specific voters capturing view-specific information; and ii) weights over views by optimizing a PAC-Bayes multiview C-Bound that takes into account the accuracy of view-specific classifiers and the diversity between the views. We derive a generalization bound for this strategy following the PAC-Bayes theory which is a suitable tool to deal with models expressed as weighted combination over a set of voters. Different experiments on three publicly available datasets show the efficiency of the proposed approach with respect to state-of-art models. |
Tasks | Document Classification, Multilingual text classification, Multiview Learning, Text Classification |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05784v2 |
http://arxiv.org/pdf/1808.05784v2.pdf | |
PWC | https://paperswithcode.com/paper/multiview-boosting-by-controlling-the |
Repo | https://github.com/goyalanil/Multiview_Dataset_MNIST |
Framework | none |
Recurrent machines for likelihood-free inference
Title | Recurrent machines for likelihood-free inference |
Authors | Arthur Pesah, Antoine Wehenkel, Gilles Louppe |
Abstract | Likelihood-free inference is concerned with the estimation of the parameters of a non-differentiable stochastic simulator that best reproduce real observations. In the absence of a likelihood function, most of the existing inference methods optimize the simulator parameters through a handcrafted iterative procedure that tries to make the simulated data more similar to the observations. In this work, we explore whether meta-learning can be used in the likelihood-free context, for learning automatically from data an iterative optimization procedure that would solve likelihood-free inference problems. We design a recurrent inference machine that learns a sequence of parameter updates leading to good parameter estimates, without ever specifying some explicit notion of divergence between the simulated data and the real data distributions. We demonstrate our approach on toy simulators, showing promising results both in terms of performance and robustness. |
Tasks | Meta-Learning |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1811.12932v2 |
http://arxiv.org/pdf/1811.12932v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-machines-for-likelihood-free |
Repo | https://github.com/artix41/ALFI-pytorch |
Framework | pytorch |
A minimax near-optimal algorithm for adaptive rejection sampling
Title | A minimax near-optimal algorithm for adaptive rejection sampling |
Authors | Juliette Achdou, Joseph C. Lam, Alexandra Carpentier, Gilles Blanchard |
Abstract | Rejection Sampling is a fundamental Monte-Carlo method. It is used to sample from distributions admitting a probability density function which can be evaluated exactly at any given point, albeit at a high computational cost. However, without proper tuning, this technique implies a high rejection rate. Several methods have been explored to cope with this problem, based on the principle of adaptively estimating the density by a simpler function, using the information of the previous samples. Most of them either rely on strong assumptions on the form of the density, or do not offer any theoretical performance guarantee. We give the first theoretical lower bound for the problem of adaptive rejection sampling and introduce a new algorithm which guarantees a near-optimal rejection rate in a minimax sense. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09390v1 |
http://arxiv.org/pdf/1810.09390v1.pdf | |
PWC | https://paperswithcode.com/paper/a-minimax-near-optimal-algorithm-for-adaptive |
Repo | https://github.com/josephclam/NNARS |
Framework | none |
Deep Frank-Wolfe For Neural Network Optimization
Title | Deep Frank-Wolfe For Neural Network Optimization |
Authors | Leonard Berrada, Andrew Zisserman, M. Pawan Kumar |
Abstract | Learning a deep neural network requires solving a challenging optimization problem: it is a high-dimensional, non-convex and non-smooth minimization problem with a large number of terms. The current practice in neural network optimization is to rely on the stochastic gradient descent (SGD) algorithm or its adaptive variants. However, SGD requires a hand-designed schedule for the learning rate. In addition, its adaptive variants tend to produce solutions that generalize less well on unseen data than SGD with a hand-designed schedule. We present an optimization method that offers empirically the best of both worlds: our algorithm yields good generalization performance while requiring only one hyper-parameter. Our approach is based on a composite proximal framework, which exploits the compositional nature of deep neural networks and can leverage powerful convex optimization algorithms by design. Specifically, we employ the Frank-Wolfe (FW) algorithm for SVM, which computes an optimal step-size in closed-form at each time-step. We further show that the descent direction is given by a simple backward pass in the network, yielding the same computational cost per iteration as SGD. We present experiments on the CIFAR and SNLI data sets, where we demonstrate the significant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. Furthermore, we compare our algorithm to SGD with a hand-designed learning rate schedule, and show that it provides similar generalization while converging faster. The code is publicly available at https://github.com/oval-group/dfw. |
Tasks | |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07591v2 |
http://arxiv.org/pdf/1811.07591v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-frank-wolfe-for-neural-network |
Repo | https://github.com/oval-group/dfw |
Framework | pytorch |
Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems
Title | Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems |
Authors | Maziar Raissi, Paris Perdikaris, George Em Karniadakis |
Abstract | The process of transforming observed data into predictive mathematical models of the physical world has always been paramount in science and engineering. Although data is currently being collected at an ever-increasing pace, devising meaningful models out of such observations in an automated fashion still remains an open problem. In this work, we put forth a machine learning approach for identifying nonlinear dynamical systems from data. Specifically, we blend classical tools from numerical analysis, namely the multi-step time-stepping schemes, with powerful nonlinear function approximators, namely deep neural networks, to distill the mechanisms that govern the evolution of a given data-set. We test the effectiveness of our approach for several benchmark problems involving the identification of complex, nonlinear and chaotic dynamics, and we demonstrate how this allows us to accurately learn the dynamics, forecast future states, and identify basins of attraction. In particular, we study the Lorenz system, the fluid flow behind a cylinder, the Hopf bifurcation, and the Glycoltic oscillator model as an example of complicated nonlinear dynamics typical of biological systems. |
Tasks | |
Published | 2018-01-04 |
URL | http://arxiv.org/abs/1801.01236v1 |
http://arxiv.org/pdf/1801.01236v1.pdf | |
PWC | https://paperswithcode.com/paper/multistep-neural-networks-for-data-driven |
Repo | https://github.com/maziarraissi/MultistepNNs |
Framework | none |
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
Title | Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs |
Authors | Xuhao Chen |
Abstract | Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs. Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.43x and 1.69x, compared to CUBLAS and CUSPARSE respectively. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10280v2 |
http://arxiv.org/pdf/1802.10280v2.pdf | |
PWC | https://paperswithcode.com/paper/escort-efficient-sparse-convolutional-neural |
Repo | https://github.com/gkUwen/learning-material |
Framework | none |
The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities
Title | The Road to Success: Assessing the Fate of Linguistic Innovations in Online Communities |
Authors | Marco Del Tredici, Raquel Fernández |
Abstract | We investigate the birth and diffusion of lexical innovations in a large dataset of online social communities. We build on sociolinguistic theories and focus on the relation between the spread of a novel term and the social role of the individuals who use it, uncovering characteristics of innovators and adopters. Finally, we perform a prediction task that allows us to anticipate whether an innovation will successfully spread within a community. |
Tasks | |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.05838v1 |
http://arxiv.org/pdf/1806.05838v1.pdf | |
PWC | https://paperswithcode.com/paper/the-road-to-success-assessing-the-fate-of |
Repo | https://github.com/marcodel13/The-Road-to-Success |
Framework | none |
Hateminers : Detecting Hate speech against Women
Title | Hateminers : Detecting Hate speech against Women |
Authors | Punyajoy Saha, Binny Mathew, Pawan Goyal, Animesh Mukherjee |
Abstract | With the online proliferation of hate speech, there is an urgent need for systems that can detect such harmful content. In this paper, We present the machine learning models developed for the Automatic Misogyny Identification (AMI) shared task at EVALITA 2018. We generate three types of features: Sentence Embeddings, TF-IDF Vectors, and BOW Vectors to represent each tweet. These features are then concatenated and fed into the machine learning models. Our model came First for the English Subtask A and Fifth for the English Subtask B. We release our winning model for public use and it’s available at https://github.com/punyajoy/Hateminers-EVALITA. |
Tasks | Hate Speech Detection, Sentence Embeddings |
Published | 2018-12-17 |
URL | http://arxiv.org/abs/1812.06700v1 |
http://arxiv.org/pdf/1812.06700v1.pdf | |
PWC | https://paperswithcode.com/paper/hateminers-detecting-hate-speech-against |
Repo | https://github.com/punyajoy/Hateminers-EVALITA |
Framework | none |
Reduced-order modeling with artificial neurons for gravitational-wave inference
Title | Reduced-order modeling with artificial neurons for gravitational-wave inference |
Authors | Alvin J. K. Chua, Chad R. Galley, Michele Vallisneri |
Abstract | Gravitational-wave data analysis is rapidly absorbing techniques from deep learning, with a focus on convolutional networks and related methods that treat noisy time series as images. We pursue an alternative approach, in which waveforms are first represented as weighted sums over reduced bases (reduced-order modeling); we then train artificial neural networks to map gravitational-wave source parameters into basis coefficients. Statistical inference proceeds directly in coefficient space, where it is theoretically straightforward and computationally efficient. The neural networks also provide analytic waveform derivatives, which are useful for gradient-based sampling schemes. We demonstrate fast and accurate coefficient interpolation for the case of a four-dimensional binary-inspiral waveform family, and discuss promising applications of our framework in parameter estimation. |
Tasks | Time Series |
Published | 2018-11-13 |
URL | https://arxiv.org/abs/1811.05491v2 |
https://arxiv.org/pdf/1811.05491v2.pdf | |
PWC | https://paperswithcode.com/paper/roman-reduced-order-modeling-with-artificial |
Repo | https://github.com/vallis/truebayes |
Framework | pytorch |
GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression
Title | GLAD: GLocalized Anomaly Detection via Active Feature Space Suppression |
Authors | Shubhomoy Das, Janardhan Rao Doppa |
Abstract | We propose an algorithm called GLAD (GLocalized Anomaly Detection) that allows end-users to retain the use of simple and understandable global anomaly detectors by automatically learning their local relevance to specific data instances using label feedback. The key idea is to place a uniform prior on the relevance of each member of the anomaly detection ensemble over the input feature space via a neural network trained on unlabeled instances, and tune the weights of the neural network to adjust the local relevance of each ensemble member using all labeled instances. Our experiments on synthetic and real-world data show the effectiveness of GLAD in learning the local relevance of ensemble members and discovering anomalies via label feedback. |
Tasks | Anomaly Detection |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01403v3 |
http://arxiv.org/pdf/1810.01403v3.pdf | |
PWC | https://paperswithcode.com/paper/glad-glocalized-anomaly-detection-via-active |
Repo | https://github.com/freedombenLiu/ad_examples |
Framework | tf |
Seglearn: A Python Package for Learning Sequences and Time Series
Title | Seglearn: A Python Package for Learning Sequences and Time Series |
Authors | David M. Burns, Cari M. Whyne |
Abstract | Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The package depends on numpy, scipy, and scikit-learn. Seglearn is distributed under the BSD 3-Clause License. Documentation includes a detailed API description, user guide, and examples. Unit tests provide a high degree of code coverage. |
Tasks | Time Series |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.08118v3 |
http://arxiv.org/pdf/1803.08118v3.pdf | |
PWC | https://paperswithcode.com/paper/seglearn-a-python-package-for-learning |
Repo | https://github.com/dmbee/seglearn |
Framework | tf |
Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems
Title | Deep Reinforcement One-Shot Learning for Artificially Intelligent Classification Systems |
Authors | Anton Puzanov, Kobi Cohen |
Abstract | In recent years there has been a sharp rise in networking applications, in which significant events need to be classified but only a few training instances are available. These are known as cases of one-shot learning. Examples include analyzing network traffic under zero-day attacks, and computer vision tasks by sensor networks deployed in the field. To handle this challenging task, organizations often use human analysts to classify events under high uncertainty. Existing algorithms use a threshold-based mechanism to decide whether to classify an object automatically or send it to an analyst for deeper inspection. However, this approach leads to a significant waste of resources since it does not take the practical temporal constraints of system resources into account. Our contribution is threefold. First, we develop a novel Deep Reinforcement One-shot Learning (DeROL) framework to address this challenge. The basic idea of the DeROL algorithm is to train a deep-Q network to obtain a policy which is oblivious to the unseen classes in the testing data. Then, in real-time, DeROL maps the current state of the one-shot learning process to operational actions based on the trained deep-Q network, to maximize the objective function. Second, we develop the first open-source software for practical artificially intelligent one-shot classification systems with limited resources for the benefit of researchers in related fields. Third, we present an extensive experimental study using the OMNIGLOT dataset for computer vision tasks and the UNSW-NB15 dataset for intrusion detection tasks that demonstrates the versatility and efficiency of the DeROL framework. |
Tasks | Intrusion Detection, Omniglot, One-Shot Learning |
Published | 2018-08-04 |
URL | http://arxiv.org/abs/1808.01527v1 |
http://arxiv.org/pdf/1808.01527v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-one-shot-learning-for |
Repo | https://github.com/antonpuz/DeROL |
Framework | tf |
Set Aggregation Network as a Trainable Pooling Layer
Title | Set Aggregation Network as a Trainable Pooling Layer |
Authors | Łukasz Maziarka, Marek Śmieja, Aleksandra Nowak, Jacek Tabor, Łukasz Struski, Przemysław Spurek |
Abstract | Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer. |
Tasks | |
Published | 2018-10-03 |
URL | https://arxiv.org/abs/1810.01868v3 |
https://arxiv.org/pdf/1810.01868v3.pdf | |
PWC | https://paperswithcode.com/paper/set-aggregation-network-for-structured-data |
Repo | https://github.com/gmum/set-aggregation |
Framework | tf |
Low Cost Edge Sensing for High Quality Demosaicking
Title | Low Cost Edge Sensing for High Quality Demosaicking |
Authors | Yan Niu, Jihong Ouyang, Wanli Zuo, Fuxin Wang |
Abstract | Digital cameras that use Color Filter Arrays (CFA) entail a demosaicking procedure to form full RGB images. As today’s camera users generally require images to be viewed instantly, demosaicking algorithms for real applications must be fast. Moreover, the associated cost should be lower than the cost saved by using CFA. For this purpose, we revisit the classical Hamilton-Adams (HA) algorithm, which outperforms many sophisticated techniques in both speed and accuracy. Inspired by HA’s strength and weakness, we design a very low cost edge sensing scheme. Briefly, it guides demosaicking by a logistic functional of the difference between directional variations. We extensively compare our algorithm with 28 demosaicking algorithms by running their open source codes on benchmark datasets. Compared to methods of similar computational cost, our method achieves substantially higher accuracy, Whereas compared to methods of similar accuracy, our method has significantly lower cost. Moreover, on test images of currently popular resolution, the quality of our algorithm is comparable to top performers, whereas its speed is tens of times faster. |
Tasks | Demosaicking |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00771v2 |
http://arxiv.org/pdf/1806.00771v2.pdf | |
PWC | https://paperswithcode.com/paper/low-cost-edge-sensing-for-high-quality |
Repo | https://github.com/shmilyo/Low-Cost-Edge-Sensing-for-High-Quality-Demosaicking |
Framework | none |