May 7, 2019

3035 words 15 mins read

Paper Group AWR 103

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. Poisson multi-Bernoulli conjugate prior for multiple extended object filtering. Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge. Visual-Inertial Monocular SLAM with Map Reuse. Learning HMMs with Nonpar …

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs


Title	Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
Authors	Yu. A. Malkov, D. A. Yashunin
Abstract	We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional search structures, which are typically used at the coarse search stage of the most proximity graph techniques. Hierarchical NSW incrementally builds a multi-layer structure consisting from hierarchical set of proximity graphs (layers) for nested subsets of the stored elements. The maximum layer in which an element is present is selected randomly with an exponentially decaying probability distribution. This allows producing graphs similar to the previously studied Navigable Small World (NSW) structures while additionally having the links separated by their characteristic distance scales. Starting search from the upper layer together with utilizing the scale separation boosts the performance compared to NSW and allows a logarithmic complexity scaling. Additional employment of a heuristic for selecting proximity graph neighbors significantly increases performance at high recall and in case of highly clustered data. Performance evaluation has demonstrated that the proposed general metric space search index is able to strongly outperform previous opensource state-of-the-art vector-only approaches. Similarity of the algorithm to the skip list structure allows straightforward balanced distributed implementation.
Tasks
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09320v4
PDF	http://arxiv.org/pdf/1603.09320v4.pdf
PWC	https://paperswithcode.com/paper/efficient-and-robust-approximate-nearest
Repo	https://github.com/RyanLiGod/multiple_target_hnswlib
Framework	mxnet

Poisson multi-Bernoulli conjugate prior for multiple extended object filtering


Title	Poisson multi-Bernoulli conjugate prior for multiple extended object filtering
Authors	Karl Granstrom, Maryam Fatemi, Lennart Svensson
Abstract	This paper presents a Poisson multi-Bernoulli mixture (PMBM) conjugate prior for multiple extended object filtering. A Poisson point process is used to describe the existence of yet undetected targets, while a multi-Bernoulli mixture describes the distribution of the targets that have been detected. The prediction and update equations are presented for the standard transition density and measurement likelihood. Both the prediction and the update preserve the PMBM form of the density, and in this sense the PMBM density is a conjugate prior. However, the unknown data associations lead to an intractably large number of terms in the PMBM density, and approximations are necessary for tractability. A gamma Gaussian inverse Wishart implementation is presented, along with methods to handle the data association problem. A simulation study shows that the extended target PMBM filter performs well in comparison to the extended target d-GLMB and LMB filters. An experiment with Lidar data illustrates the benefit of tracking both detected and undetected targets.
Tasks
Published	2016-05-20
URL	https://arxiv.org/abs/1605.06311v6
PDF	https://arxiv.org/pdf/1605.06311v6.pdf
PWC	https://paperswithcode.com/paper/poisson-multi-bernoulli-conjugate-prior-for
Repo	https://github.com/yuhsuansia/Extended-target-PMB-filter
Framework	none

Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge


Title	Untrimmed Video Classification for Activity Detection: submission to ActivityNet Challenge
Authors	Gurkirt Singh, Fabio Cuzzolin
Abstract	Current state-of-the-art human activity recognition is focused on the classification of temporally trimmed videos in which only one action occurs per frame. We propose a simple, yet effective, method for the temporal detection of activities in temporally untrimmed videos with the help of untrimmed classification. Firstly, our model predicts the top k labels for each untrimmed video by analysing global video-level features. Secondly, frame-level binary classification is combined with dynamic programming to generate the temporally trimmed activity proposals. Finally, each proposal is assigned a label based on the global label, and scored with the score of the temporal activity proposal and the global score. Ultimately, we show that untrimmed video classification models can be used as stepping stone for temporal detection.
Tasks	Action Detection, Activity Detection, Activity Recognition, Human Activity Recognition, Video Classification
Published	2016-07-07
URL	http://arxiv.org/abs/1607.01979v2
PDF	http://arxiv.org/pdf/1607.01979v2.pdf
PWC	https://paperswithcode.com/paper/untrimmed-video-classification-for-activity
Repo	https://github.com/gurkirt/actNet-inAct
Framework	none

Visual-Inertial Monocular SLAM with Map Reuse


Title	Visual-Inertial Monocular SLAM with Map Reuse
Authors	Raul Mur-Artal, Juan D. Tardos
Abstract	In recent years there have been excellent results in Visual-Inertial Odometry techniques, which aim to compute the incremental motion of the sensor with high accuracy and robustness. However these approaches lack the capability to close loops, and trajectory estimation accumulates drift even if the sensor is continually revisiting the same place. In this work we present a novel tightly-coupled Visual-Inertial Simultaneous Localization and Mapping system that is able to close loops and reuse its map to achieve zero-drift localization in already mapped areas. While our approach can be applied to any camera configuration, we address here the most general problem of a monocular camera, with its well-known scale ambiguity. We also propose a novel IMU initialization method, which computes the scale, the gravity direction, the velocity, and gyroscope and accelerometer biases, in a few seconds with high accuracy. We test our system in the 11 sequences of a recent micro-aerial vehicle public dataset achieving a typical scale factor error of 1% and centimeter precision. We compare to the state-of-the-art in visual-inertial odometry in sequences with revisiting, proving the better accuracy of our method due to map reuse and no drift accumulation.
Tasks	Simultaneous Localization and Mapping
Published	2016-10-19
URL	http://arxiv.org/abs/1610.05949v2
PDF	http://arxiv.org/pdf/1610.05949v2.pdf
PWC	https://paperswithcode.com/paper/visual-inertial-monocular-slam-with-map-reuse
Repo	https://github.com/lianbin/VIOSLAM
Framework	none

Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices


Title	Learning HMMs with Nonparametric Emissions via Spectral Decompositions of Continuous Matrices
Authors	Kirthevasan Kandasamy, Maruan Al-Shedivat, Eric P. Xing
Abstract	Recently, there has been a surge of interest in using spectral methods for estimating latent variable models. However, it is usually assumed that the distribution of the observations conditioned on the latent variables is either discrete or belongs to a parametric family. In this paper, we study the estimation of an $m$-state hidden Markov model (HMM) with only smoothness assumptions, such as H"olderian conditions, on the emission densities. By leveraging some recent advances in continuous linear algebra and numerical analysis, we develop a computationally efficient spectral algorithm for learning nonparametric HMMs. Our technique is based on computing an SVD on nonparametric estimates of density functions by viewing them as \emph{continuous matrices}. We derive sample complexity bounds via concentration results for nonparametric density estimation and novel perturbation theory results for continuous matrices. We implement our method using Chebyshev polynomial approximations. Our method is competitive with other baselines on synthetic and real problems and is also very computationally efficient.
Tasks	Density Estimation, Latent Variable Models
Published	2016-09-21
URL	http://arxiv.org/abs/1609.06390v1
PDF	http://arxiv.org/pdf/1609.06390v1.pdf
PWC	https://paperswithcode.com/paper/learning-hmms-with-nonparametric-emissions
Repo	https://github.com/alshedivat/nphmm
Framework	none

Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection


Title	Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection
Authors	Markus Schneider, Wolfgang Ertel, Fabio Ramos
Abstract	We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.
Tasks	Anomaly Detection
Published	2016-01-25
URL	http://arxiv.org/abs/1601.06602v3
PDF	http://arxiv.org/pdf/1601.06602v3.pdf
PWC	https://paperswithcode.com/paper/expected-similarity-estimation-for-large
Repo	https://github.com/numenta/NAB
Framework	none

CoCoA: A General Framework for Communication-Efficient Distributed Optimization


Title	CoCoA: A General Framework for Communication-Efficient Distributed Optimization
Authors	Virginia Smith, Simone Forte, Chenxin Ma, Martin Takac, Michael I. Jordan, Martin Jaggi
Abstract	The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets.
Tasks	Distributed Optimization
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02189v2
PDF	http://arxiv.org/pdf/1611.02189v2.pdf
PWC	https://paperswithcode.com/paper/cocoa-a-general-framework-for-communication
Repo	https://github.com/gingsmith/cocoa
Framework	none

Learning End-to-End Goal-Oriented Dialog


Title	Learning End-to-End Goal-Oriented Dialog
Authors	Antoine Bordes, Y-Lan Boureau, Jason Weston
Abstract	Traditional dialog systems used in goal-oriented applications require a lot of domain-specific handcrafting, which hinders scaling up to new domains. End-to-end dialog systems, in which all components are trained from the dialogs themselves, escape this limitation. But the encouraging success recently obtained in chit-chat dialog may not carry over to goal-oriented settings. This paper proposes a testbed to break down the strengths and shortcomings of end-to-end dialog systems in goal-oriented applications. Set in the context of restaurant reservation, our tasks require manipulating sentences and symbols, so as to properly conduct conversations, issue API calls and use the outputs of such calls. We show that an end-to-end dialog system based on Memory Networks can reach promising, yet imperfect, performance and learn to perform non-trivial operations. We confirm those results by comparing our system to a hand-crafted slot-filling baseline on data from the second Dialog State Tracking Challenge (Henderson et al., 2014a). We show similar result patterns on data extracted from an online concierge service.
Tasks	Goal-Oriented Dialog, Slot Filling
Published	2016-05-24
URL	http://arxiv.org/abs/1605.07683v4
PDF	http://arxiv.org/pdf/1605.07683v4.pdf
PWC	https://paperswithcode.com/paper/learning-end-to-end-goal-oriented-dialog
Repo	https://github.com/ysglh/Task-Oriented-Dialogue-Dataset-Survey
Framework	none

Neural Machine Translation with Reconstruction


Title	Neural Machine Translation with Reconstruction
Authors	Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
Abstract	Although end-to-end Neural Machine Translation (NMT) has achieved remarkable progress in the past two years, it suffers from a major drawback: translations generated by NMT systems often lack of adequacy. It has been widely observed that NMT tends to repeatedly translate some source words while mistakenly ignoring other words. To alleviate this problem, we propose a novel encoder-decoder-reconstructor framework for NMT. The reconstructor, incorporated into the NMT model, manages to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible. Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-the-art NMT and statistical MT systems.
Tasks	Machine Translation
Published	2016-11-07
URL	http://arxiv.org/abs/1611.01874v2
PDF	http://arxiv.org/pdf/1611.01874v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-with
Repo	https://github.com/tuzhaopeng/nmt
Framework	none

Modeling Coverage for Neural Machine Translation


Title	Modeling Coverage for Neural Machine Translation
Authors	Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, Hang Li
Abstract	Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to consider more about untranslated source words. Experiments show that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based NMT.
Tasks	Machine Translation
Published	2016-01-19
URL	http://arxiv.org/abs/1601.04811v6
PDF	http://arxiv.org/pdf/1601.04811v6.pdf
PWC	https://paperswithcode.com/paper/modeling-coverage-for-neural-machine
Repo	https://github.com/ZhenYangIACAS/NMT
Framework	none

Learning the kernel matrix via predictive low-rank approximations


Title	Learning the kernel matrix via predictive low-rank approximations
Authors	Martin Stražar, Tomaž Curk
Abstract	Efficient and accurate low-rank approximations of multiple data sources are essential in the era of big data. The scaling of kernel-based learning algorithms to large datasets is limited by the O(n^2) computation and storage complexity of the full kernel matrix, which is required by most of the recent kernel learning algorithms. We present the Mklaren algorithm to approximate multiple kernel matrices learn a regression model, which is entirely based on geometrical concepts. The algorithm does not require access to full kernel matrices yet it accounts for the correlations between all kernels. It uses Incomplete Cholesky decomposition, where pivot selection is based on least-angle regression in the combined, low-dimensional feature space. The algorithm has linear complexity in the number of data points and kernels. When explicit feature space induced by the kernel can be constructed, a mapping from the dual to the primal Ridge regression weights is used for model interpretation. The Mklaren algorithm was tested on eight standard regression datasets. It outperforms contemporary kernel matrix approximation approaches when learning with multiple kernels. It identifies relevant kernels, achieving highest explained variance than other multiple kernel learning methods for the same number of iterations. Test accuracy, equivalent to the one using full kernel matrices, was achieved with at significantly lower approximation ranks. A difference in run times of two orders of magnitude was observed when either the number of samples or kernels exceeds 3000.
Tasks
Published	2016-01-17
URL	http://arxiv.org/abs/1601.04366v2
PDF	http://arxiv.org/pdf/1601.04366v2.pdf
PWC	https://paperswithcode.com/paper/learning-the-kernel-matrix-via-predictive-low
Repo	https://github.com/mstrazar/mklaren
Framework	none

Model-Free Episodic Control


Title	Model-Free Episodic Control
Authors	Charles Blundell, Benigno Uria, Alexander Pritzel, Yazhe Li, Avraham Ruderman, Joel Z Leibo, Jack Rae, Daan Wierstra, Demis Hassabis
Abstract	State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.
Tasks	Decision Making
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04460v1
PDF	http://arxiv.org/pdf/1606.04460v1.pdf
PWC	https://paperswithcode.com/paper/model-free-episodic-control
Repo	https://github.com/nkxujun/DeepSemanticMatching
Framework	none

Deep Information Propagation


Title	Deep Information Propagation
Authors	Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein
Abstract	We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth scales that we identify provide bounds on how deep a network may be trained for a specific choice of hyperparameters. As a corollary to this, we argue that in networks at the edge of chaos, one of these depth scales diverges. Thus arbitrarily deep networks may be trained only sufficiently close to criticality. We show that the presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks. Finally, we develop a mean field theory for backpropagation and we show that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively.
Tasks
Published	2016-11-04
URL	http://arxiv.org/abs/1611.01232v2
PDF	http://arxiv.org/pdf/1611.01232v2.pdf
PWC	https://paperswithcode.com/paper/deep-information-propagation
Repo	https://github.com/ghliu/mean-field-fcdnn
Framework	pytorch

Direct Feedback Alignment Provides Learning in Deep Neural Networks


Title	Direct Feedback Alignment Provides Learning in Deep Neural Networks
Authors	Arild Nøkland
Abstract	Artificial neural networks are most commonly trained with the back-propagation algorithm, where the gradient for learning is provided by back-propagating the error, layer by layer, from the output layer to the hidden layers. A recently discovered method called feedback-alignment shows that the weights used for propagating the error backward don’t have to be symmetric with the weights used for propagation the activation forward. In fact, random feedback weights work evenly well, because the network learns how to make the feedback useful. In this work, the feedback alignment principle is used for training hidden layers more independently from the rest of the network, and from a zero initial condition. The error is propagated through fixed random feedback connections directly from the output layer to each hidden layer. This simple method is able to achieve zero training error even in convolutional networks and very deep networks, completely without error back-propagation. The method is a step towards biologically plausible machine learning because the error signal is almost local, and no symmetric or reciprocal weights are required. Experiments show that the test performance on MNIST and CIFAR is almost as good as those obtained with back-propagation for fully connected networks. If combined with dropout, the method achieves 1.45% error on the permutation invariant MNIST task.
Tasks
Published	2016-09-06
URL	http://arxiv.org/abs/1609.01596v5
PDF	http://arxiv.org/pdf/1609.01596v5.pdf
PWC	https://paperswithcode.com/paper/direct-feedback-alignment-provides-learning
Repo	https://github.com/metataro/DirectFeedbackAlignment
Framework	none

Why does deep and cheap learning work so well?


Title	Why does deep and cheap learning work so well?
Authors	Henry W. Lin, Max Tegmark, David Rolnick
Abstract	We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through “cheap learning” with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various “no-flattening theorems” showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that $n$ variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.
Tasks
Published	2016-08-29
URL	http://arxiv.org/abs/1608.08225v4
PDF	http://arxiv.org/pdf/1608.08225v4.pdf
PWC	https://paperswithcode.com/paper/why-does-deep-and-cheap-learning-work-so-well
Repo	https://github.com/epignatelli/literature
Framework	none