January 31, 2020

3097 words 15 mins read

Paper Group AWR 429

Online Multi-Object Tracking with Dual Matching Attention Networks. Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features. daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices. A Higher-Order Swiss Army Infinitesimal Jackknife. VarGFaceNet: An Efficient Variable Group Convolutional Neura …

Online Multi-Object Tracking with Dual Matching Attention Networks


Title	Online Multi-Object Tracking with Dual Matching Attention Networks
Authors	Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, Ming-Hsuan Yang
Abstract	In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets. Specifically, for applying single object tracking in MOT, we introduce a cost-sensitive tracking loss based on the state-of-the-art visual tracker, which encourages the model to focus on hard negative distractors during online learning. For data association, we propose Dual Matching Attention Networks (DMAN) with both spatial and temporal attention mechanisms. The spatial attention module generates dual attention maps which enable the network to focus on the matching patterns of the input image pair, while the temporal attention module adaptively allocates different levels of attention to different samples in the tracklet to suppress noisy observations. Experimental results on the MOT benchmark datasets show that the proposed algorithm performs favorably against both online and offline trackers in terms of identity-preserving metrics.
Tasks	Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking
Published	2019-02-02
URL	http://arxiv.org/abs/1902.00749v1
PDF	http://arxiv.org/pdf/1902.00749v1.pdf
PWC	https://paperswithcode.com/paper/online-multi-object-tracking-with-dual
Repo	https://github.com/jizhu1023/DMAN_MOT
Framework	tf

Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features


Title	Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features
Authors	Arno Solin, Manon Kok
Abstract	Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. This paper considers constraining GPs to arbitrarily-shaped domains with boundary conditions. We solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest, which both constrains the GP and attains a low-rank representation that is used for speeding up inference. The method scales as $\mathcal{O}(nm^2)$ in prediction and $\mathcal{O}(m^3)$ in hyperparameter learning for regression, where $n$ is the number of data points and $m$ the number of features. Furthermore, we make use of the variational approach to allow the method to deal with non-Gaussian likelihoods. The experiments cover both simulated and empirical data in which the boundary conditions allow for inclusion of additional physical information.
Tasks	Gaussian Processes
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05207v1
PDF	http://arxiv.org/pdf/1904.05207v1.pdf
PWC	https://paperswithcode.com/paper/know-your-boundaries-constraining-gaussian
Repo	https://github.com/AaltoML/boundary-gp
Framework	none

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices


Title	daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices
Authors	Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei
Abstract	It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations. Nevertheless, there has not been open-source implementation in support of this idea on low-end ARM devices (e.g., mobile phones and embedded devices). In this work, we propose daBNN — a super fast inference framework that implements BNNs on ARM devices. Several speed-up and memory refinement strategies for bit-packing, binarized convolution, and memory layout are uniquely devised to enhance inference efficiency. Compared to the recent open-source BNN inference framework, BMXNet, our daBNN is $7\times$$\sim$$23\times$ faster on a single binary convolution, and about $6\times$ faster on Bi-Real Net 18 (a BNN variant of ResNet-18). The daBNN is a BSD-licensed inference framework, and its source code, sample projects and pre-trained models are available on-line: https://github.com/JDAI-CV/dabnn.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05858v1
PDF	https://arxiv.org/pdf/1908.05858v1.pdf
PWC	https://paperswithcode.com/paper/dabnn-a-super-fast-inference-framework-for
Repo	https://github.com/JDAI-CV/dabnn
Framework	none

A Higher-Order Swiss Army Infinitesimal Jackknife


Title	A Higher-Order Swiss Army Infinitesimal Jackknife
Authors	Ryan Giordano, Michael I. Jordan, Tamara Broderick
Abstract	Cross validation (CV) and the bootstrap are ubiquitous model-agnostic tools for assessing the error or variability of machine learning and statistical estimators. However, these methods require repeatedly re-fitting the model with different weighted versions of the original dataset, which can be prohibitively time-consuming. For sufficiently regular optimization problems the optimum depends smoothly on the data weights, and so the process of repeatedly re-fitting can be approximated with a Taylor series that can be often evaluated relatively quickly. The first-order approximation is known as the “infinitesimal jackknife” in the statistics literature and has been the subject of recent interest in machine learning for approximate CV. In this work, we consider high-order approximations, which we call the “higher-order infinitesimal jackknife” (HOIJ). Under mild regularity conditions, we provide a simple recursive procedure to compute approximations of all orders with finite-sample accuracy bounds. Additionally, we show that the HOIJ can be efficiently computed even in high dimensions using forward-mode automatic differentiation. We show that a linear approximation with bootstrap weights approximation is equivalent to those provided by asymptotic normal approximations. Consequently, the HOIJ opens up the possibility of enjoying higher-order accuracy properties of the bootstrap using local approximations. Consistency of the HOIJ for leave-one-out CV under different asymptotic regimes follows as corollaries from our finite-sample bounds under additional regularity assumptions. The generality of the computation and bounds motivate the name “higher-order Swiss Army infinitesimal jackknife.”
Tasks
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12116v1
PDF	https://arxiv.org/pdf/1907.12116v1.pdf
PWC	https://paperswithcode.com/paper/a-higher-order-swiss-army-infinitesimal
Repo	https://github.com/rgiordan/vittles
Framework	none

VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition


Title	VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition
Authors	Mengjia Yan, Mengao Zhao, Zining Xu, Qian Zhang, Guoli Wang, Zhizhong Su
Abstract	To improve the discriminative and generalization ability of lightweight network for face recognition, we propose an efficient variable group convolutional network called VarGFaceNet. Variable group convolution is introduced by VarGNet to solve the conflict between small computational cost and the unbalance of computational intensity inside a block. We employ variable group convolution to design our network which can support large scale face identification while reduce computational cost and parameters. Specifically, we use a head setting to reserve essential information at the start of the network and propose a particular embedding setting to reduce parameters of fully-connected layer for embedding. To enhance interpretation ability, we employ an equivalence of angular distillation loss to guide our lightweight network and we apply recursive knowledge distillation to relieve the discrepancy between the teacher model and the student model. The champion of deepglint-light track of LFR (2019) challenge demonstrates the effectiveness of our model and approach. Implementation of VarGFaceNet will be released at https://github.com/zma-c-137/VarGFaceNet soon.
Tasks	Face Detection, Face Identification, Face Recognition
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04985v4
PDF	https://arxiv.org/pdf/1910.04985v4.pdf
PWC	https://paperswithcode.com/paper/vargfacenet-an-efficient-variable-group
Repo	https://github.com/zma-c-137/VarGFaceNet
Framework	mxnet

Scale-Equivariant Steerable Networks


Title	Scale-Equivariant Steerable Networks
Authors	Ivan Sosnovik, Michał Szmaja, Arnold Smeulders
Abstract	The effectiveness of Convolutional Neural Networks (CNNs) has been substantially attributed to their built-in property of translation equivariance. However, CNNs do not have embedded mechanisms to handle other types of transformations. In this work, we pay attention to scale changes, which regularly appear in various tasks due to the changing distances between the objects and the camera. First, we introduce the general theory for building scale-equivariant convolutional networks with steerable filters. We develop scale-convolution and generalize other common blocks to be scale-equivariant. We demonstrate the computational efficiency and numerical stability of the proposed method. We compare the proposed models to the previously developed methods for scale equivariance and local scale invariance. We demonstrate state-of-the-art results on MNIST-scale dataset and on STL-10 dataset in the supervised learning setting.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.11093v2
PDF	https://arxiv.org/pdf/1910.11093v2.pdf
PWC	https://paperswithcode.com/paper/scale-equivariant-steerable-networks-1
Repo	https://github.com/ISosnovik/sesn
Framework	pytorch

DeOccNet: Learning to See Through Foreground Occlusions in Light Fields


Title	DeOccNet: Learning to See Through Foreground Occlusions in Light Fields
Authors	Yingqian Wang, Tianhao Wu, Jungang Yang, Longguang Wang, Wei An, Yulan Guo
Abstract	Background objects occluded in some views of a light field (LF) camera can be seen by other views. Consequently, occluded surfaces are possible to be reconstructed from LF images. In this paper, we handle the LF de-occlusion (LF-DeOcc) problem using a deep encoder-decoder network (namely, DeOccNet). In our method, sub-aperture images (SAIs) are first given to the encoder to incorporate both spatial and angular information. The encoded representations are then used by the decoder to render an occlusionfree center-view SAI. To the best of our knowledge, DeOccNet is the first deep learning-based LF-DeOcc method. To handle the insufficiency of training data, we propose an LF synthesis approach to embed selected occlusion masks into existing LF images. Besides, several synthetic and realworld LFs are developed for performance evaluation. Experimental results show that, after training on the generated data, our DeOccNet can effectively remove foreground occlusions and achieves superior performance as compared to other state-of-the-art methods. Source codes are available at: https://github.com/YingqianWang/DeOccNet.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04459v1
PDF	https://arxiv.org/pdf/1912.04459v1.pdf
PWC	https://paperswithcode.com/paper/deoccnet-learning-to-see-through-foreground
Repo	https://github.com/YingqianWang/DeOccNet
Framework	pytorch

Image-Based Size Analysis of Agglomerated and Partially Sintered Particles via Convolutional Neural Networks


Title	Image-Based Size Analysis of Agglomerated and Partially Sintered Particles via Convolutional Neural Networks
Authors	Max Frei, Frank Einar Kruis
Abstract	There is a high demand for fully automated methods for the analysis of primary particle size distributions of agglomerated, sintered or occluded primary particles, due to their impact on material properties. Therefore, a novel, deep learning-based, method for the detection of such primary particles was proposed and tested, which renders a manual tuning of analysis parameters unnecessary. As a specialty, the training of the utilized convolutional neural networks was carried out using only synthetic images, thereby avoiding the laborious task of manual annotation and increasing the ground truth quality. Nevertheless, the proposed method performs excellent on real world samples of sintered silica nanoparticles with various sintering degrees and varying image conditions. In a direct comparison, the proposed method clearly outperforms two state-of-the-art methods for automated image-based particle size analysis (Hough transformation and the ImageJ ParticleSizer plug-in), thereby attaining human-like performance.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05112v3
PDF	https://arxiv.org/pdf/1907.05112v3.pdf
PWC	https://paperswithcode.com/paper/image-based-size-analysis-of-agglomerated-and
Repo	https://github.com/maxfrei750/DeepParticleNet
Framework	tf

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing


Title	GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing
Authors	Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu
Abstract	We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage.
Tasks
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04433v2
PDF	https://arxiv.org/pdf/1907.04433v2.pdf
PWC	https://paperswithcode.com/paper/gluoncv-and-gluonnlp-deep-learning-in
Repo	https://github.com/xcgoner/AISTATS2020-AdaAlter-GluonNLP
Framework	mxnet

Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML


Title	Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
Authors	Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May Obelilly
Abstract	In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13806v1
PDF	https://arxiv.org/pdf/1909.13806v1.pdf
PWC	https://paperswithcode.com/paper/min-max-optimization-without-gradients
Repo	https://github.com/KaidiXu/ZO-minmax
Framework	tf

Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches


Title	Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches
Authors	Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach
Abstract	Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today’s research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area. Source code of our experiments and full results are available at: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation.
Tasks	Recommendation Systems
Published	2019-07-16
URL	https://arxiv.org/abs/1907.06902v3
PDF	https://arxiv.org/pdf/1907.06902v3.pdf
PWC	https://paperswithcode.com/paper/are-we-really-making-much-progress-a-worrying
Repo	https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation
Framework	none

HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding


Title	HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding
Authors	Sheng Zhou, Jiajun Bu, Xin Wang, Jiawei Chen, Can Wang
Abstract	Heterogeneous information network (HIN) embedding has recently attracted much attention due to its effectiveness in dealing with the complex heterogeneous data. Meta path, which connects different object types with various semantic meanings, is widely used by existing HIN embedding works. However, several challenges have not been addressed so far. First, different meta paths convey different semantic meanings, while existing works assume that all nodes share same weights for meta paths and ignore the personalized preferences of different nodes on different meta paths. Second, given a meta path, nodes in HIN are connected by path instances while existing works fail to fully explore the differences between path instances that reflect nodes’ preferences in the semantic space. rTo tackle the above challenges, we propose aHierarchical Attentive Heterogeneous information network Embedding (HAHE) model to capture the personalized preferences on meta paths and path instances in each semantic space. As path instances are based on a particular meta path, a hierarchical attention mechanism is naturally utilized to model the personalized preference on meta paths and path instances. Extensive experiments on several real-world datasets show that our proposed \model model significantly outperforms the state-of-the-art methods in terms of various data mining tasks.
Tasks	Network Embedding
Published	2019-01-31
URL	https://arxiv.org/abs/1902.01475v2
PDF	https://arxiv.org/pdf/1902.01475v2.pdf
PWC	https://paperswithcode.com/paper/hahe-hierarchical-attentive-heterogeneous
Repo	https://github.com/Jhy1993/HAN
Framework	tf


Title	Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss
Authors	Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
Abstract	We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03820v1
PDF	https://arxiv.org/pdf/1905.03820v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-cross-modal-talking-face
Repo	https://github.com/lelechen63/ATVGnet
Framework	pytorch

Motion Guided Attention for Video Salient Object Detection


Title	Motion Guided Attention for Video Salient Object Detection
Authors	Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu
Abstract	Video salient object detection aims at discovering the most visually distinctive objects in a video. How to effectively take object motion into consideration during video salient object detection is a critical issue. Existing state-of-the-art methods either do not explicitly model and harvest motion cues or ignore spatial contexts within optical flow images. In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images. We further introduce a series of novel motion guided attention modules, which utilize the motion saliency sub-network to attend and enhance the sub-network for still images. These two sub-networks learn to adapt to each other by end-to-end training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks. We hope our simple and effective approach will serve as a solid baseline and help ease future research in video salient object detection. Code and models will be made available.
Tasks	Object Detection, Optical Flow Estimation, Saliency Detection, Salient Object Detection, Video Salient Object Detection
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07061v2
PDF	https://arxiv.org/pdf/1909.07061v2.pdf
PWC	https://paperswithcode.com/paper/motion-guided-attention-for-video-salient
Repo	https://github.com/lhaof/Motion-Guided-Attention
Framework	pytorch

A Hierarchical Model for Data-to-Text Generation


Title	A Hierarchical Model for Data-to-Text Generation
Authors	Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari
Abstract	Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as “data-to-text”. These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level. Evaluations on RotoWire show the effectiveness of our model w.r.t. qualitative and quantitative metrics.
Tasks	Data-to-Text Generation, Text Generation
Published	2019-12-20
URL	https://arxiv.org/abs/1912.10011v1
PDF	https://arxiv.org/pdf/1912.10011v1.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-model-for-data-to-text
Repo	https://github.com/KaijuML/data-to-text-hierarchical
Framework	pytorch