January 31, 2020

3097 words 15 mins read

Paper Group AWR 429

Paper Group AWR 429

Online Multi-Object Tracking with Dual Matching Attention Networks. Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features. daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices. A Higher-Order Swiss Army Infinitesimal Jackknife. VarGFaceNet: An Efficient Variable Group Convolutional Neura …

Online Multi-Object Tracking with Dual Matching Attention Networks

Title Online Multi-Object Tracking with Dual Matching Attention Networks
Authors Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, Ming-Hsuan Yang
Abstract In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets. Specifically, for applying single object tracking in MOT, we introduce a cost-sensitive tracking loss based on the state-of-the-art visual tracker, which encourages the model to focus on hard negative distractors during online learning. For data association, we propose Dual Matching Attention Networks (DMAN) with both spatial and temporal attention mechanisms. The spatial attention module generates dual attention maps which enable the network to focus on the matching patterns of the input image pair, while the temporal attention module adaptively allocates different levels of attention to different samples in the tracklet to suppress noisy observations. Experimental results on the MOT benchmark datasets show that the proposed algorithm performs favorably against both online and offline trackers in terms of identity-preserving metrics.
Tasks Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published 2019-02-02
URL http://arxiv.org/abs/1902.00749v1
PDF http://arxiv.org/pdf/1902.00749v1.pdf
PWC https://paperswithcode.com/paper/online-multi-object-tracking-with-dual
Repo https://github.com/jizhu1023/DMAN_MOT
Framework tf

Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features

Title Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features
Authors Arno Solin, Manon Kok
Abstract Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. This paper considers constraining GPs to arbitrarily-shaped domains with boundary conditions. We solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest, which both constrains the GP and attains a low-rank representation that is used for speeding up inference. The method scales as $\mathcal{O}(nm^2)$ in prediction and $\mathcal{O}(m^3)$ in hyperparameter learning for regression, where $n$ is the number of data points and $m$ the number of features. Furthermore, we make use of the variational approach to allow the method to deal with non-Gaussian likelihoods. The experiments cover both simulated and empirical data in which the boundary conditions allow for inclusion of additional physical information.
Tasks Gaussian Processes
Published 2019-04-10
URL http://arxiv.org/abs/1904.05207v1
PDF http://arxiv.org/pdf/1904.05207v1.pdf
PWC https://paperswithcode.com/paper/know-your-boundaries-constraining-gaussian
Repo https://github.com/AaltoML/boundary-gp
Framework none

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices

Title daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices
Authors Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei
Abstract It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations. Nevertheless, there has not been open-source implementation in support of this idea on low-end ARM devices (e.g., mobile phones and embedded devices). In this work, we propose daBNN — a super fast inference framework that implements BNNs on ARM devices. Several speed-up and memory refinement strategies for bit-packing, binarized convolution, and memory layout are uniquely devised to enhance inference efficiency. Compared to the recent open-source BNN inference framework, BMXNet, our daBNN is $7\times$$\sim$$23\times$ faster on a single binary convolution, and about $6\times$ faster on Bi-Real Net 18 (a BNN variant of ResNet-18). The daBNN is a BSD-licensed inference framework, and its source code, sample projects and pre-trained models are available on-line: https://github.com/JDAI-CV/dabnn.
Tasks
Published 2019-08-16
URL https://arxiv.org/abs/1908.05858v1
PDF https://arxiv.org/pdf/1908.05858v1.pdf
PWC https://paperswithcode.com/paper/dabnn-a-super-fast-inference-framework-for
Repo https://github.com/JDAI-CV/dabnn
Framework none

A Higher-Order Swiss Army Infinitesimal Jackknife

Title A Higher-Order Swiss Army Infinitesimal Jackknife
Authors Ryan Giordano, Michael I. Jordan, Tamara Broderick
Abstract Cross validation (CV) and the bootstrap are ubiquitous model-agnostic tools for assessing the error or variability of machine learning and statistical estimators. However, these methods require repeatedly re-fitting the model with different weighted versions of the original dataset, which can be prohibitively time-consuming. For sufficiently regular optimization problems the optimum depends smoothly on the data weights, and so the process of repeatedly re-fitting can be approximated with a Taylor series that can be often evaluated relatively quickly. The first-order approximation is known as the “infinitesimal jackknife” in the statistics literature and has been the subject of recent interest in machine learning for approximate CV. In this work, we consider high-order approximations, which we call the “higher-order infinitesimal jackknife” (HOIJ). Under mild regularity conditions, we provide a simple recursive procedure to compute approximations of all orders with finite-sample accuracy bounds. Additionally, we show that the HOIJ can be efficiently computed even in high dimensions using forward-mode automatic differentiation. We show that a linear approximation with bootstrap weights approximation is equivalent to those provided by asymptotic normal approximations. Consequently, the HOIJ opens up the possibility of enjoying higher-order accuracy properties of the bootstrap using local approximations. Consistency of the HOIJ for leave-one-out CV under different asymptotic regimes follows as corollaries from our finite-sample bounds under additional regularity assumptions. The generality of the computation and bounds motivate the name “higher-order Swiss Army infinitesimal jackknife.”
Tasks
Published 2019-07-28
URL https://arxiv.org/abs/1907.12116v1
PDF https://arxiv.org/pdf/1907.12116v1.pdf
PWC https://paperswithcode.com/paper/a-higher-order-swiss-army-infinitesimal
Repo https://github.com/rgiordan/vittles
Framework none

VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition

Title VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition
Authors Mengjia Yan, Mengao Zhao, Zining Xu, Qian Zhang, Guoli Wang, Zhizhong Su
Abstract To improve the discriminative and generalization ability of lightweight network for face recognition, we propose an efficient variable group convolutional network called VarGFaceNet. Variable group convolution is introduced by VarGNet to solve the conflict between small computational cost and the unbalance of computational intensity inside a block. We employ variable group convolution to design our network which can support large scale face identification while reduce computational cost and parameters. Specifically, we use a head setting to reserve essential information at the start of the network and propose a particular embedding setting to reduce parameters of fully-connected layer for embedding. To enhance interpretation ability, we employ an equivalence of angular distillation loss to guide our lightweight network and we apply recursive knowledge distillation to relieve the discrepancy between the teacher model and the student model. The champion of deepglint-light track of LFR (2019) challenge demonstrates the effectiveness of our model and approach. Implementation of VarGFaceNet will be released at https://github.com/zma-c-137/VarGFaceNet soon.
Tasks Face Detection, Face Identification, Face Recognition
Published 2019-10-11
URL https://arxiv.org/abs/1910.04985v4
PDF https://arxiv.org/pdf/1910.04985v4.pdf
PWC https://paperswithcode.com/paper/vargfacenet-an-efficient-variable-group
Repo https://github.com/zma-c-137/VarGFaceNet
Framework mxnet

Scale-Equivariant Steerable Networks

Title Scale-Equivariant Steerable Networks
Authors Ivan Sosnovik, Michał Szmaja, Arnold Smeulders
Abstract The effectiveness of Convolutional Neural Networks (CNNs) has been substantially attributed to their built-in property of translation equivariance. However, CNNs do not have embedded mechanisms to handle other types of transformations. In this work, we pay attention to scale changes, which regularly appear in various tasks due to the changing distances between the objects and the camera. First, we introduce the general theory for building scale-equivariant convolutional networks with steerable filters. We develop scale-convolution and generalize other common blocks to be scale-equivariant. We demonstrate the computational efficiency and numerical stability of the proposed method. We compare the proposed models to the previously developed methods for scale equivariance and local scale invariance. We demonstrate state-of-the-art results on MNIST-scale dataset and on STL-10 dataset in the supervised learning setting.
Tasks
Published 2019-10-14
URL https://arxiv.org/abs/1910.11093v2
PDF https://arxiv.org/pdf/1910.11093v2.pdf
PWC https://paperswithcode.com/paper/scale-equivariant-steerable-networks-1
Repo https://github.com/ISosnovik/sesn
Framework pytorch

DeOccNet: Learning to See Through Foreground Occlusions in Light Fields

Title DeOccNet: Learning to See Through Foreground Occlusions in Light Fields
Authors Yingqian Wang, Tianhao Wu, Jungang Yang, Longguang Wang, Wei An, Yulan Guo
Abstract Background objects occluded in some views of a light field (LF) camera can be seen by other views. Consequently, occluded surfaces are possible to be reconstructed from LF images. In this paper, we handle the LF de-occlusion (LF-DeOcc) problem using a deep encoder-decoder network (namely, DeOccNet). In our method, sub-aperture images (SAIs) are first given to the encoder to incorporate both spatial and angular information. The encoded representations are then used by the decoder to render an occlusionfree center-view SAI. To the best of our knowledge, DeOccNet is the first deep learning-based LF-DeOcc method. To handle the insufficiency of training data, we propose an LF synthesis approach to embed selected occlusion masks into existing LF images. Besides, several synthetic and realworld LFs are developed for performance evaluation. Experimental results show that, after training on the generated data, our DeOccNet can effectively remove foreground occlusions and achieves superior performance as compared to other state-of-the-art methods. Source codes are available at: https://github.com/YingqianWang/DeOccNet.
Tasks
Published 2019-12-10
URL https://arxiv.org/abs/1912.04459v1
PDF https://arxiv.org/pdf/1912.04459v1.pdf
PWC https://paperswithcode.com/paper/deoccnet-learning-to-see-through-foreground
Repo https://github.com/YingqianWang/DeOccNet
Framework pytorch

Image-Based Size Analysis of Agglomerated and Partially Sintered Particles via Convolutional Neural Networks

Title Image-Based Size Analysis of Agglomerated and Partially Sintered Particles via Convolutional Neural Networks
Authors Max Frei, Frank Einar Kruis
Abstract There is a high demand for fully automated methods for the analysis of primary particle size distributions of agglomerated, sintered or occluded primary particles, due to their impact on material properties. Therefore, a novel, deep learning-based, method for the detection of such primary particles was proposed and tested, which renders a manual tuning of analysis parameters unnecessary. As a specialty, the training of the utilized convolutional neural networks was carried out using only synthetic images, thereby avoiding the laborious task of manual annotation and increasing the ground truth quality. Nevertheless, the proposed method performs excellent on real world samples of sintered silica nanoparticles with various sintering degrees and varying image conditions. In a direct comparison, the proposed method clearly outperforms two state-of-the-art methods for automated image-based particle size analysis (Hough transformation and the ImageJ ParticleSizer plug-in), thereby attaining human-like performance.
Tasks
Published 2019-07-11
URL https://arxiv.org/abs/1907.05112v3
PDF https://arxiv.org/pdf/1907.05112v3.pdf
PWC https://paperswithcode.com/paper/image-based-size-analysis-of-agglomerated-and
Repo https://github.com/maxfrei750/DeepParticleNet
Framework tf

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

Title GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing
Authors Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu
Abstract We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage.
Tasks
Published 2019-07-09
URL https://arxiv.org/abs/1907.04433v2
PDF https://arxiv.org/pdf/1907.04433v2.pdf
PWC https://paperswithcode.com/paper/gluoncv-and-gluonnlp-deep-learning-in
Repo https://github.com/xcgoner/AISTATS2020-AdaAlter-GluonNLP
Framework mxnet

Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML

Title Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
Authors Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May Obelilly
Abstract In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.
Tasks
Published 2019-09-30
URL https://arxiv.org/abs/1909.13806v1
PDF https://arxiv.org/pdf/1909.13806v1.pdf
PWC https://paperswithcode.com/paper/min-max-optimization-without-gradients
Repo https://github.com/KaidiXu/ZO-minmax
Framework tf

Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches

Title Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches
Authors Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach
Abstract Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today’s research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area. Source code of our experiments and full results are available at: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation.
Tasks Recommendation Systems
Published 2019-07-16
URL https://arxiv.org/abs/1907.06902v3
PDF https://arxiv.org/pdf/1907.06902v3.pdf
PWC https://paperswithcode.com/paper/are-we-really-making-much-progress-a-worrying
Repo https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation
Framework none

HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding

Title HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding
Authors Sheng Zhou, Jiajun Bu, Xin Wang, Jiawei Chen, Can Wang
Abstract Heterogeneous information network (HIN) embedding has recently attracted much attention due to its effectiveness in dealing with the complex heterogeneous data. Meta path, which connects different object types with various semantic meanings, is widely used by existing HIN embedding works. However, several challenges have not been addressed so far. First, different meta paths convey different semantic meanings, while existing works assume that all nodes share same weights for meta paths and ignore the personalized preferences of different nodes on different meta paths. Second, given a meta path, nodes in HIN are connected by path instances while existing works fail to fully explore the differences between path instances that reflect nodes’ preferences in the semantic space. rTo tackle the above challenges, we propose aHierarchical Attentive Heterogeneous information network Embedding (HAHE) model to capture the personalized preferences on meta paths and path instances in each semantic space. As path instances are based on a particular meta path, a hierarchical attention mechanism is naturally utilized to model the personalized preference on meta paths and path instances. Extensive experiments on several real-world datasets show that our proposed \model model significantly outperforms the state-of-the-art methods in terms of various data mining tasks.
Tasks Network Embedding
Published 2019-01-31
URL https://arxiv.org/abs/1902.01475v2
PDF https://arxiv.org/pdf/1902.01475v2.pdf
PWC https://paperswithcode.com/paper/hahe-hierarchical-attentive-heterogeneous
Repo https://github.com/Jhy1993/HAN
Framework tf

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

Title Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss
Authors Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
Abstract We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons.
Tasks
Published 2019-05-09
URL https://arxiv.org/abs/1905.03820v1
PDF https://arxiv.org/pdf/1905.03820v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-cross-modal-talking-face
Repo https://github.com/lelechen63/ATVGnet
Framework pytorch

Motion Guided Attention for Video Salient Object Detection

Title Motion Guided Attention for Video Salient Object Detection
Authors Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu
Abstract Video salient object detection aims at discovering the most visually distinctive objects in a video. How to effectively take object motion into consideration during video salient object detection is a critical issue. Existing state-of-the-art methods either do not explicitly model and harvest motion cues or ignore spatial contexts within optical flow images. In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images. We further introduce a series of novel motion guided attention modules, which utilize the motion saliency sub-network to attend and enhance the sub-network for still images. These two sub-networks learn to adapt to each other by end-to-end training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks. We hope our simple and effective approach will serve as a solid baseline and help ease future research in video salient object detection. Code and models will be made available.
Tasks Object Detection, Optical Flow Estimation, Saliency Detection, Salient Object Detection, Video Salient Object Detection
Published 2019-09-16
URL https://arxiv.org/abs/1909.07061v2
PDF https://arxiv.org/pdf/1909.07061v2.pdf
PWC https://paperswithcode.com/paper/motion-guided-attention-for-video-salient
Repo https://github.com/lhaof/Motion-Guided-Attention
Framework pytorch

A Hierarchical Model for Data-to-Text Generation

Title A Hierarchical Model for Data-to-Text Generation
Authors Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari
Abstract Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as “data-to-text”. These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level. Evaluations on RotoWire show the effectiveness of our model w.r.t. qualitative and quantitative metrics.
Tasks Data-to-Text Generation, Text Generation
Published 2019-12-20
URL https://arxiv.org/abs/1912.10011v1
PDF https://arxiv.org/pdf/1912.10011v1.pdf
PWC https://paperswithcode.com/paper/a-hierarchical-model-for-data-to-text
Repo https://github.com/KaijuML/data-to-text-hierarchical
Framework pytorch
comments powered by Disqus