July 30, 2019

2952 words 14 mins read

Paper Group AWR 66

Paper Group AWR 66

An In-Depth Analysis of Visual Tracking with Siamese Neural Networks. Tracking using Numerous Anchor points. Learning to Run with Actor-Critic Ensemble. Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding. A Simple Neural Attentive Meta-Learner. Easily parallelizable and distributable class of algorithms for structured sparsity, …

An In-Depth Analysis of Visual Tracking with Siamese Neural Networks

Title An In-Depth Analysis of Visual Tracking with Siamese Neural Networks
Authors Roman Pflugfelder
Abstract This survey presents a deep analysis of the learning and inference capabilities in nine popular trackers. It is neither intended to study the whole literature nor is it an attempt to review all kinds of neural networks proposed for visual tracking. We focus instead on Siamese neural networks which are a promising starting point for studying the challenging problem of tracking. These networks integrate efficiently feature learning and the temporal matching and have so far shown state-of-the-art performance. In particular, the branches of Siamese networks, their layers connecting these branches, specific aspects of training and the embedding of these networks into the tracker are highlighted. Quantitative results from existing papers are compared with the conclusion that the current evaluation methodology shows problems with the reproducibility and the comparability of results. The paper proposes a novel Lisp-like formalism for a better comparison of trackers. This assumes a certain functional design and functional decomposition of trackers. The paper tries to give foundation for tracker design by a formulation of the problem based on the theory of machine learning and by the interpretation of a tracker as a decision function. The work concludes with promising lines of research and suggests future work.
Tasks Visual Tracking
Published 2017-07-03
URL http://arxiv.org/abs/1707.00569v2
PDF http://arxiv.org/pdf/1707.00569v2.pdf
PWC https://paperswithcode.com/paper/an-in-depth-analysis-of-visual-tracking-with
Repo https://github.com/czla/siamese-tracker-road-trip
Framework pytorch

Tracking using Numerous Anchor points

Title Tracking using Numerous Anchor points
Authors Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger
Abstract In this paper, an online adaptive model-free tracker is proposed to track single objects in video sequences to deal with real-world tracking challenges like low-resolution, object deformation, occlusion and motion blur. The novelty lies in the construction of a strong appearance model that captures features from the initialized bounding box and then are assembled into anchor-point features. These features memorize the global pattern of the object and have an internal star graph-like structure. These features are unique and flexible and helps tracking generic and deformable objects with no limitation on specific objects. In addition, the relevance of each feature is evaluated online using short-term consistency and long-term consistency. These parameters are adapted to retain consistent features that vote for the object location and that deal with outliers for long-term tracking scenarios. Additionally, voting in a Gaussian manner helps in tackling inherent noise of the tracking system and in accurate object localization. Furthermore, the proposed tracker uses pairwise distance measure to cope with scale variations and combines pixel-level binary features and global weighted color features for model update. Finally, experimental results on a visual tracking benchmark dataset are presented to demonstrate the effectiveness and competitiveness of the proposed tracker.
Tasks Object Localization, Visual Tracking
Published 2017-02-07
URL http://arxiv.org/abs/1702.02012v2
PDF http://arxiv.org/pdf/1702.02012v2.pdf
PWC https://paperswithcode.com/paper/tracking-using-numerous-anchor-points
Repo https://github.com/sinbycos/TUNA
Framework none

Learning to Run with Actor-Critic Ensemble

Title Learning to Run with Actor-Critic Ensemble
Authors Zhewei Huang, Shuchang Zhou, BoEr Zhuang, Xinyu Zhou
Abstract We introduce an Actor-Critic Ensemble(ACE) method for improving the performance of Deep Deterministic Policy Gradient(DDPG) algorithm. At inference time, our method uses a critic ensemble to select the best action from proposals of multiple actors running in parallel. By having a larger candidate set, our method can avoid actions that have fatal consequences, while staying deterministic. Using ACE, we have won the 2nd place in NIPS’17 Learning to Run competition, under the name of “Megvii-hzwer”.
Tasks
Published 2017-12-25
URL http://arxiv.org/abs/1712.08987v1
PDF http://arxiv.org/pdf/1712.08987v1.pdf
PWC https://paperswithcode.com/paper/learning-to-run-with-actor-critic-ensemble
Repo https://github.com/hzwer/NIPS2017-LearningToRun
Framework tf

Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding

Title Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding
Authors Zequn Sun, Wei Hu, Chengkai Li
Abstract Entity alignment is the task of finding entities in two knowledge bases (KBs) that represent the same real-world object. When facing KBs in different natural languages, conventional cross-lingual entity alignment methods rely on machine translation to eliminate the language barriers. These approaches often suffer from the uneven quality of translations between languages. While recent embedding-based techniques encode entities and relationships in KBs and do not need machine translation for cross-lingual entity alignment, a significant number of attributes remain largely unexplored. In this paper, we propose a joint attribute-preserving embedding model for cross-lingual entity alignment. It jointly embeds the structures of two KBs into a unified vector space and further refines it by leveraging attribute correlations in the KBs. Our experimental results on real-world datasets show that this approach significantly outperforms the state-of-the-art embedding approaches for cross-lingual entity alignment and could be complemented with methods based on machine translation.
Tasks Entity Alignment, Machine Translation
Published 2017-08-16
URL http://arxiv.org/abs/1708.05045v2
PDF http://arxiv.org/pdf/1708.05045v2.pdf
PWC https://paperswithcode.com/paper/cross-lingual-entity-alignment-via-joint
Repo https://github.com/nju-websoft/JAPE
Framework tf

A Simple Neural Attentive Meta-Learner

Title A Simple Neural Attentive Meta-Learner
Authors Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel
Abstract Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins.
Tasks Few-Shot Image Classification, Meta-Learning
Published 2017-07-11
URL http://arxiv.org/abs/1707.03141v3
PDF http://arxiv.org/pdf/1707.03141v3.pdf
PWC https://paperswithcode.com/paper/a-simple-neural-attentive-meta-learner
Repo https://github.com/Michedev/snail
Framework pytorch

Easily parallelizable and distributable class of algorithms for structured sparsity, with optimal acceleration

Title Easily parallelizable and distributable class of algorithms for structured sparsity, with optimal acceleration
Authors Seyoon Ko, Donghyeon Yu, Joong-Ho Won
Abstract Many statistical learning problems can be posed as minimization of a sum of two convex functions, one typically a composition of non-smooth and linear functions. Examples include regression under structured sparsity assumptions. Popular algorithms for solving such problems, e.g., ADMM, often involve non-trivial optimization subproblems or smoothing approximation. We consider two classes of primal-dual algorithms that do not incur these difficulties, and unify them from a perspective of monotone operator theory. From this unification we propose a continuum of preconditioned forward-backward operator splitting algorithms amenable to parallel and distributed computing. For the entire region of convergence of the whole continuum of algorithms, we establish its rates of convergence. For some known instances of this continuum, our analysis closes the gap in theory. We further exploit the unification to propose a continuum of accelerated algorithms. We show that the whole continuum attains the theoretically optimal rate of convergence. The scalability of the proposed algorithms, as well as their convergence behavior, is demonstrated up to 1.2 million variables with a distributed implementation.
Tasks
Published 2017-02-21
URL http://arxiv.org/abs/1702.06234v3
PDF http://arxiv.org/pdf/1702.06234v3.pdf
PWC https://paperswithcode.com/paper/easily-parallelizable-and-distributable-class
Repo https://github.com/kose-y/dist-primal-dual
Framework tf

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Title Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Authors Chelsea Finn, Pieter Abbeel, Sergey Levine
Abstract We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.
Tasks Few-Shot Image Classification, Few-Shot Learning, few-shot regression, Image Classification, Meta-Learning, One-Shot Learning
Published 2017-03-09
URL http://arxiv.org/abs/1703.03400v3
PDF http://arxiv.org/pdf/1703.03400v3.pdf
PWC https://paperswithcode.com/paper/model-agnostic-meta-learning-for-fast
Repo https://github.com/dragen1860/MAML-Pytorch
Framework pytorch

Axioms in Model-based Planners

Title Axioms in Model-based Planners
Authors Shuwa Miura, Alex Fukunaga
Abstract Axioms can be used to model derived predicates in domain- independent planning models. Formulating models which use axioms can sometimes result in problems with much smaller search spaces and shorter plans than the original model. Previous work on axiom-aware planners focused solely on state- space search planners. We propose axiom-aware planners based on answer set programming and integer programming. We evaluate them on PDDL domains with axioms and show that they can exploit additional expressivity of axioms.
Tasks
Published 2017-03-11
URL http://arxiv.org/abs/1703.03916v2
PDF http://arxiv.org/pdf/1703.03916v2.pdf
PWC https://paperswithcode.com/paper/axioms-in-model-based-planners
Repo https://github.com/guicho271828/papers-on-axioms
Framework none

Non-local Neural Networks

Title Non-local Neural Networks
Authors Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
Abstract Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code is available at https://github.com/facebookresearch/video-nonlocal-net .
Tasks Instance Segmentation, Keypoint Detection, Object Detection, Pose Estimation, Video Classification
Published 2017-11-21
URL http://arxiv.org/abs/1711.07971v3
PDF http://arxiv.org/pdf/1711.07971v3.pdf
PWC https://paperswithcode.com/paper/non-local-neural-networks
Repo https://github.com/facebookresearch/video-nonlocal-net
Framework caffe2

Robust, Deep and Inductive Anomaly Detection

Title Robust, Deep and Inductive Anomaly Detection
Authors Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla
Abstract PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique. However, it is limited in this regard by being sensitive to gross perturbations of the input, and by seeking a linear subspace that captures normal behaviour. The first issue has been dealt with by robust PCA, a variant of PCA that explicitly allows for some data points to be arbitrarily corrupted, however, this does not resolve the second issue, and indeed introduces the new issue that one can no longer inductively find anomalies on a test set. This paper addresses both issues in a single model, the robust autoencoder. This method learns a nonlinear subspace that captures the majority of data points, while allowing for some data to have arbitrary corruption. The model is simple to train and leverages recent advances in the optimisation of deep neural networks. Experiments on a range of real-world datasets highlight the model’s effectiveness.
Tasks Anomaly Detection
Published 2017-04-22
URL http://arxiv.org/abs/1704.06743v3
PDF http://arxiv.org/pdf/1704.06743v3.pdf
PWC https://paperswithcode.com/paper/robust-deep-and-inductive-anomaly-detection
Repo https://github.com/raghavchalapathy/oc-nn
Framework tf

Model Criticism in Latent Space

Title Model Criticism in Latent Space
Authors Sohan Seth, Iain Murray, Christopher K. I. Williams
Abstract Model criticism is usually carried out by assessing if replicated data generated under the fitted model looks similar to the observed data, see e.g. Gelman, Carlin, Stern, and Rubin [2004, p. 165]. This paper presents a method for latent variable models by pulling back the data into the space of latent variables, and carrying out model criticism in that space. Making use of a model’s structure enables a more direct assessment of the assumptions made in the prior and likelihood. We demonstrate the method with examples of model criticism in latent space applied to factor analysis, linear dynamical systems and Gaussian processes.
Tasks Gaussian Processes, Latent Variable Models
Published 2017-11-13
URL http://arxiv.org/abs/1711.04674v2
PDF http://arxiv.org/pdf/1711.04674v2.pdf
PWC https://paperswithcode.com/paper/model-criticism-in-latent-space
Repo https://github.com/sohanseth/mcls
Framework none

Flow-Guided Feature Aggregation for Video Object Detection

Title Flow-Guided Feature Aggregation for Video Object Detection
Authors Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei
Abstract Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Our method significantly improves upon strong single-frame baselines in ImageNet VID, especially for more challenging fast moving objects. Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016, without additional bells-and-whistles. The proposed method, together with Deep Feature Flow, powered the winning entry of ImageNet VID challenges 2017. The code is available at https://github.com/msracver/Flow-Guided-Feature-Aggregation.
Tasks Object Detection, Video Object Detection, Video Recognition
Published 2017-03-29
URL http://arxiv.org/abs/1703.10025v2
PDF http://arxiv.org/pdf/1703.10025v2.pdf
PWC https://paperswithcode.com/paper/flow-guided-feature-aggregation-for-video
Repo https://github.com/msracver/Flow-Guided-Feature-Aggregation
Framework mxnet

Detect-and-Track: Efficient Pose Estimation in Videos

Title Detect-and-Track: Efficient Pose Estimation in Videos
Authors Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri, Du Tran
Abstract This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video. We propose an extremely lightweight yet highly effective approach that builds upon the latest advancements in human detection and video understanding. Our method operates in two-stages: keypoint estimation in frames or short clips, followed by lightweight tracking to generate keypoint predictions linked over the entire video. For frame-level pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D extension of this model, which leverages temporal information over small clips to generate more robust frame predictions. We conduct extensive ablative experiments on the newly released multi-person video pose estimation benchmark, PoseTrack, to validate various design choices of our model. Our approach achieves an accuracy of 55.2% on the validation and 51.8% on the test set using the Multi-Object Tracking Accuracy (MOTA) metric, and achieves state of the art performance on the ICCV 2017 PoseTrack keypoint tracking challenge.
Tasks Human Detection, Multi-Object Tracking, Object Tracking, Pose Estimation, Pose Tracking, Video Understanding
Published 2017-12-26
URL http://arxiv.org/abs/1712.09184v2
PDF http://arxiv.org/pdf/1712.09184v2.pdf
PWC https://paperswithcode.com/paper/detect-and-track-efficient-pose-estimation-in
Repo https://github.com/facebookresearch/DetectAndTrack
Framework none

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Title Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Authors Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, Antoine Bordes
Abstract Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful. Several attempts at learning unsupervised representations of sentences have not reached satisfactory enough performance to be widely adopted. In this paper, we show how universal sentence representations trained using the supervised data of the Stanford Natural Language Inference datasets can consistently outperform unsupervised methods like SkipThought vectors on a wide range of transfer tasks. Much like how computer vision uses ImageNet to obtain features, which can then be transferred to other tasks, our work tends to indicate the suitability of natural language inference for transfer learning to other NLP tasks. Our encoder is publicly available.
Tasks Cross-Lingual Natural Language Inference, Natural Language Inference, Semantic Textual Similarity, Transfer Learning, Word Embeddings
Published 2017-05-05
URL http://arxiv.org/abs/1705.02364v5
PDF http://arxiv.org/pdf/1705.02364v5.pdf
PWC https://paperswithcode.com/paper/supervised-learning-of-universal-sentence
Repo https://github.com/menajosep/AleatoricSent
Framework tf

Convolutional 2D Knowledge Graph Embeddings

Title Convolutional 2D Knowledge Graph Embeddings
Authors Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel
Abstract Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow, fast models which can scale to large knowledge graphs. However, these models learn less expressive features than deep, multi-layer models – which potentially limits performance. In this work, we introduce ConvE, a multi-layer convolutional network model for link prediction, and report state-of-the-art results for several established datasets. We also show that the model is highly parameter efficient, yielding the same performance as DistMult and R-GCN with 8x and 17x fewer parameters. Analysis of our model suggests that it is particularly effective at modelling nodes with high indegree – which are common in highly-connected, complex knowledge graphs such as Freebase and YAGO3. In addition, it has been noted that the WN18 and FB15k datasets suffer from test set leakage, due to inverse relations from the training set being present in the test set – however, the extent of this issue has so far not been quantified. We find this problem to be severe: a simple rule-based model can achieve state-of-the-art results on both WN18 and FB15k. To ensure that models are evaluated on datasets where simply exploiting inverse relations cannot yield competitive results, we investigate and validate several commonly used datasets – deriving robust variants where necessary. We then perform experiments on these robust datasets for our own and several previously proposed models and find that ConvE achieves state-of-the-art Mean Reciprocal Rank across most datasets.
Tasks Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction
Published 2017-07-05
URL http://arxiv.org/abs/1707.01476v6
PDF http://arxiv.org/pdf/1707.01476v6.pdf
PWC https://paperswithcode.com/paper/convolutional-2d-knowledge-graph-embeddings
Repo https://github.com/INK-USC/RENet
Framework pytorch
comments powered by Disqus