May 7, 2019

3035 words 15 mins read

Paper Group AWR 103

Paper Group AWR 103

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. Poisson multi-Bernoulli conjugate prior for multiple extended object filtering. Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge. Visual-Inertial Monocular SLAM with Map Reuse. Learning HMMs with Nonpar …

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Title Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
Authors Yu. A. Malkov, D. A. Yashunin
Abstract We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.
Tasks
Published 2016-03-30
URL http://arxiv.org/abs/1603.09320v4
PDF http://arxiv.org/pdf/1603.09320v4.pdf
PWC https://paperswithcode.com/paper/efficient-and-robust-approximate-nearest
Repo https://github.com/RyanLiGod/multiple_target_hnswlib
Framework mxnet

Poisson multi-Bernoulli conjugate prior for multiple extended object filtering

Title Poisson multi-Bernoulli conjugate prior for multiple extended object filtering
Authors Karl Granstrom, Maryam Fatemi, Lennart Svensson
Abstract This paper presents a Poisson multi-Bernoulli mixture (PMBM) conjugate prior for multiple extended object filtering. A Poisson point process is used to describe the existence of yet undetected targets, while a multi-Bernoulli mixture describes the distribution of the targets that have been detected. The prediction and update equations are presented for the standard transition density and measurement likelihood. Both the prediction and the update preserve the PMBM form of the density, and in this sense the PMBM density is a conjugate prior. However, the unknown data associations lead to an intractably large number of terms in the PMBM density, and approximations are necessary for tractability. A gamma Gaussian inverse Wishart implementation is presented, along with methods to handle the data association problem. A simulation study shows that the extended target PMBM filter performs well in comparison to the extended target d-GLMB and LMB filters. An experiment with Lidar data illustrates the benefit of tracking both detected and undetected targets.
Tasks
Published 2016-05-20
URL https://arxiv.org/abs/1605.06311v6
PDF https://arxiv.org/pdf/1605.06311v6.pdf
PWC https://paperswithcode.com/paper/poisson-multi-bernoulli-conjugate-prior-for
Repo https://github.com/yuhsuansia/Extended-target-PMB-filter
Framework none

Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge

Title Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge
Authors Gurkirt Singh, Fabio Cuzzolin
Abstract Current state-of-the-art human activity recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. We propose a simple, yet effective, method for the temporal detection of activities in temporally untrimmed videos with the help of untrimmed classification. Firstly, our model predicts the top k labels for each untrimmed video by analysing global video-level features. Secondly, frame-level binary classification is combined with dynamic programming to generate the temporally trimmed activity proposals. Finally, each proposal is assigned a label based on the global label, and scored with the score of the temporal activity proposal and the global score. Ultimately, we show that untrimmed video classification models can be used as stepping stone for temporal detection.
Tasks Action Detection, Activity Detection, Activity Recognition, Human Activity Recognition, Video Classification
Published 2016-07-07
URL http://arxiv.org/abs/1607.01979v2
PDF http://arxiv.org/pdf/1607.01979v2.pdf
PWC https://paperswithcode.com/paper/untrimmed-video-classification-for-activity
Repo https://github.com/gurkirt/actNet-inAct
Framework none

Visual-Inertial Monocular SLAM with Map Reuse

Title Visual-Inertial Monocular SLAM with Map Reuse
Authors Raul Mur-Artal, Juan D. Tardos
Abstract In recent years there have been excellent results in Visual-Inertial Odometry techniques, which aim to compute the incremental motion of the sensor with high accuracy and robustness. However these approaches lack the capability to close loops, and trajectory estimation accumulates drift even if the sensor is continually revisiting the same place. In this work we present a novel tightly-coupled Visual-Inertial Simultaneous Localization and Mapping system that is able to close loops and reuse its map to achieve zero-drift localization in already mapped areas. While our approach can be applied to any camera configuration, we address here the most general problem of a monocular camera, with its well-known scale ambiguity. We also propose a novel IMU initialization method, which computes the scale, the gravity direction, the velocity, and gyroscope and accelerometer biases, in a few seconds with high accuracy. We test our system in the 11 sequences of a recent micro-aerial vehicle public dataset achieving a typical scale factor error of 1% and centimeter precision. We compare to the state-of-the-art in visual-inertial odometry in sequences with revisiting, proving the better accuracy of our method due to map reuse and no drift accumulation.
Tasks Simultaneous Localization and Mapping
Published 2016-10-19
URL http://arxiv.org/abs/1610.05949v2
PDF http://arxiv.org/pdf/1610.05949v2.pdf
PWC https://paperswithcode.com/paper/visual-inertial-monocular-slam-with-map-reuse
Repo https://github.com/lianbin/VIOSLAM
Framework none

Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices

Title Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices
Authors Kirthevasan Kandasamy, Maruan Al-Shedivat, Eric P. Xing
Abstract Recently, there has been a surge of interest in using spectral methods for estimating latent variable models. However, it is usually assumed that the distribution of the observations conditioned on the latent variables is either discrete or belongs to a parametric family. In this paper, we study the estimation of an $m$-state hidden Markov model (HMM) with only smoothness assumptions, such as H"olderian conditions, on the emission densities. By leveraging some recent advances in continuous linear algebra and numerical analysis, we develop a computationally efficient spectral algorithm for learning nonparametric HMMs. Our technique is based on computing an SVD on nonparametric estimates of density functions by viewing them as \emph{continuous matrices}. We derive sample complexity bounds via concentration results for nonparametric density estimation and novel perturbation theory results for continuous matrices. We implement our method using Chebyshev polynomial approximations. Our method is competitive with other baselines on synthetic and real problems and is also very computationally efficient.
Tasks Density Estimation, Latent Variable Models
Published 2016-09-21
URL http://arxiv.org/abs/1609.06390v1
PDF http://arxiv.org/pdf/1609.06390v1.pdf
PWC https://paperswithcode.com/paper/learning-hmms-with-nonparametric-emissions
Repo https://github.com/alshedivat/nphmm
Framework none

Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection

Title Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection
Authors Markus Schneider, Wolfgang Ertel, Fabio Ramos
Abstract We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.
Tasks Anomaly Detection
Published 2016-01-25
URL http://arxiv.org/abs/1601.06602v3
PDF http://arxiv.org/pdf/1601.06602v3.pdf
PWC https://paperswithcode.com/paper/expected-similarity-estimation-for-large
Repo https://github.com/numenta/NAB
Framework none

CoCoA: A General Framework for Communication-Efficient Distributed Optimization

Title CoCoA: A General Framework for Communication-Efficient Distributed Optimization
Authors Virginia Smith, Simone Forte, Chenxin Ma, Martin Takac, Michael I. Jordan, Martin Jaggi
Abstract The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets.
Tasks Distributed Optimization
Published 2016-11-07
URL http://arxiv.org/abs/1611.02189v2
PDF http://arxiv.org/pdf/1611.02189v2.pdf
PWC https://paperswithcode.com/paper/cocoa-a-general-framework-for-communication
Repo https://github.com/gingsmith/cocoa
Framework none

Learning End-to-End Goal-Oriented Dialog

Title Learning End-to-End Goal-Oriented Dialog
Authors Antoine Bordes, Y-Lan Boureau, Jason Weston
Abstract Traditional dialog systems used in goal-oriented applications require a lot of domain-specific handcrafting, which hinders scaling up to new domains. End-to-end dialog systems, in which all components are trained from the dialogs themselves, escape this limitation. But the encouraging success recently obtained in chit-chat dialog may not carry over to goal-oriented settings. This paper proposes a testbed to break down the strengths and shortcomings of end-to-end dialog systems in goal-oriented applications. Set in the context of restaurant reservation, our tasks require manipulating sentences and symbols, so as to properly conduct conversations, issue API calls and use the outputs of such calls. We show that an end-to-end dialog system based on Memory Networks can reach promising, yet imperfect, performance and learn to perform non-trivial operations. We confirm those results by comparing our system to a hand-crafted slot-filling baseline on data from the second Dialog State Tracking Challenge (Henderson et al., 2014a). We show similar result patterns on data extracted from an online concierge service.
Tasks Goal-Oriented Dialog, Slot Filling
Published 2016-05-24
URL http://arxiv.org/abs/1605.07683v4
PDF http://arxiv.org/pdf/1605.07683v4.pdf
PWC https://paperswithcode.com/paper/learning-end-to-end-goal-oriented-dialog
Repo https://github.com/ysglh/Task-Oriented-Dialogue-Dataset-Survey
Framework none

Neural Machine Translation with Reconstruction

Title Neural Machine Translation with Reconstruction
Authors Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
Abstract Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy. It has been widely observed that NMT tends to repeatedly translate some source words while mistakenly ignoring other words. To alleviate this problem, we propose a novel encoder-decoder-reconstructor framework for NMT. The reconstructor, incorporated into the NMT model, manages to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible. Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-the-art NMT and statistical MT systems.
Tasks Machine Translation
Published 2016-11-07
URL http://arxiv.org/abs/1611.01874v2
PDF http://arxiv.org/pdf/1611.01874v2.pdf
PWC https://paperswithcode.com/paper/neural-machine-translation-with
Repo https://github.com/tuzhaopeng/nmt
Framework none

Modeling Coverage for Neural Machine Translation

Title Modeling Coverage for Neural Machine Translation
Authors Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li
Abstract Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to consider more about untranslated source words. Experiments show that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based NMT.
Tasks Machine Translation
Published 2016-01-19
URL http://arxiv.org/abs/1601.04811v6
PDF http://arxiv.org/pdf/1601.04811v6.pdf
PWC https://paperswithcode.com/paper/modeling-coverage-for-neural-machine
Repo https://github.com/ZhenYangIACAS/NMT
Framework none

Learning the kernel matrix via predictive low-rank approximations

Title Learning the kernel matrix via predictive low-rank approximations
Authors Martin Stražar, Tomaž Curk
Abstract Efficient and accurate low-rank approximations of multiple data sources are essential in the era of big data. The scaling of kernel-based learning algorithms to large datasets is limited by the O(n^2) computation and storage complexity of the full kernel matrix, which is required by most of the recent kernel learning algorithms. We present the Mklaren algorithm to approximate multiple kernel matrices learn a regression model, which is entirely based on geometrical concepts. The algorithm does not require access to full kernel matrices yet it accounts for the correlations between all kernels. It uses Incomplete Cholesky decomposition, where pivot selection is based on least-angle regression in the combined, low-dimensional feature space. The algorithm has linear complexity in the number of data points and kernels. When explicit feature space induced by the kernel can be constructed, a mapping from the dual to the primal Ridge regression weights is used for model interpretation. The Mklaren algorithm was tested on eight standard regression datasets. It outperforms contemporary kernel matrix approximation approaches when learning with multiple kernels. It identifies relevant kernels, achieving highest explained variance than other multiple kernel learning methods for the same number of iterations. Test accuracy, equivalent to the one using full kernel matrices, was achieved with at significantly lower approximation ranks. A difference in run times of two orders of magnitude was observed when either the number of samples or kernels exceeds 3000.
Tasks
Published 2016-01-17
URL http://arxiv.org/abs/1601.04366v2
PDF http://arxiv.org/pdf/1601.04366v2.pdf
PWC https://paperswithcode.com/paper/learning-the-kernel-matrix-via-predictive-low
Repo https://github.com/mstrazar/mklaren
Framework none

Model-Free Episodic Control

Title Model-Free Episodic Control
Authors Charles Blundell, Benigno Uria, Alexander Pritzel, Yazhe Li, Avraham Ruderman, Joel Z Leibo, Jack Rae, Daan Wierstra, Demis Hassabis
Abstract State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.
Tasks Decision Making
Published 2016-06-14
URL http://arxiv.org/abs/1606.04460v1
PDF http://arxiv.org/pdf/1606.04460v1.pdf
PWC https://paperswithcode.com/paper/model-free-episodic-control
Repo https://github.com/nkxujun/DeepSemanticMatching
Framework none

Deep Information Propagation

Title Deep Information Propagation
Authors Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein
Abstract We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth scales that we identify provide bounds on how deep a network may be trained for a specific choice of hyperparameters. As a corollary to this, we argue that in networks at the edge of chaos, one of these depth scales diverges. Thus arbitrarily deep networks may be trained only sufficiently close to criticality. We show that the presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks. Finally, we develop a mean field theory for backpropagation and we show that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively.
Tasks
Published 2016-11-04
URL http://arxiv.org/abs/1611.01232v2
PDF http://arxiv.org/pdf/1611.01232v2.pdf
PWC https://paperswithcode.com/paper/deep-information-propagation
Repo https://github.com/ghliu/mean-field-fcdnn
Framework pytorch

Direct Feedback Alignment Provides Learning in Deep Neural Networks

Title Direct Feedback Alignment Provides Learning in Deep Neural Networks
Authors Arild Nøkland
Abstract Artificial neural networks are most commonly trained with the back-propagation algorithm, where the gradient for learning is provided by back-propagating the error, layer by layer, from the output layer to the hidden layers. A recently discovered method called feedback-alignment shows that the weights used for propagating the error backward don’t have to be symmetric with the weights used for propagation the activation forward. In fact, random feedback weights work evenly well, because the network learns how to make the feedback useful. In this work, the feedback alignment principle is used for training hidden layers more independently from the rest of the network, and from a zero initial condition. The error is propagated through fixed random feedback connections directly from the output layer to each hidden layer. This simple method is able to achieve zero training error even in convolutional networks and very deep networks, completely without error back-propagation. The method is a step towards biologically plausible machine learning because the error signal is almost local, and no symmetric or reciprocal weights are required. Experiments show that the test performance on MNIST and CIFAR is almost as good as those obtained with back-propagation for fully connected networks. If combined with dropout, the method achieves 1.45% error on the permutation invariant MNIST task.
Tasks
Published 2016-09-06
URL http://arxiv.org/abs/1609.01596v5
PDF http://arxiv.org/pdf/1609.01596v5.pdf
PWC https://paperswithcode.com/paper/direct-feedback-alignment-provides-learning
Repo https://github.com/metataro/DirectFeedbackAlignment
Framework none

Why does deep and cheap learning work so well?

Title Why does deep and cheap learning work so well?
Authors Henry W. Lin, Max Tegmark, David Rolnick
Abstract We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through “cheap learning” with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various “no-flattening theorems” showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that $n$ variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.
Tasks
Published 2016-08-29
URL http://arxiv.org/abs/1608.08225v4
PDF http://arxiv.org/pdf/1608.08225v4.pdf
PWC https://paperswithcode.com/paper/why-does-deep-and-cheap-learning-work-so-well
Repo https://github.com/epignatelli/literature
Framework none
comments powered by Disqus