Paper Group AWR 429
Online Multi-Object Tracking with Dual Matching Attention Networks. Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features. daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices. A Higher-Order Swiss Army Infinitesimal Jackknife. VarGFaceNet: An Efficient Variable Group Convolutional Neura …
Online Multi-Object Tracking with Dual Matching Attention Networks
Title | Online Multi-Object Tracking with Dual Matching Attention Networks |
Authors | Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, Ming-Hsuan Yang |
Abstract | In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets. Specifically, for applying single object tracking in MOT, we introduce a cost-sensitive tracking loss based on the state-of-the-art visual tracker, which encourages the model to focus on hard negative distractors during online learning. For data association, we propose Dual Matching Attention Networks (DMAN) with both spatial and temporal attention mechanisms. The spatial attention module generates dual attention maps which enable the network to focus on the matching patterns of the input image pair, while the temporal attention module adaptively allocates different levels of attention to different samples in the tracklet to suppress noisy observations. Experimental results on the MOT benchmark datasets show that the proposed algorithm performs favorably against both online and offline trackers in terms of identity-preserving metrics. |
Tasks | Multi-Object Tracking, Object Tracking, Online Multi-Object Tracking |
Published | 2019-02-02 |
URL | http://arxiv.org/abs/1902.00749v1 |
http://arxiv.org/pdf/1902.00749v1.pdf | |
PWC | https://paperswithcode.com/paper/online-multi-object-tracking-with-dual |
Repo | https://github.com/jizhu1023/DMAN_MOT |
Framework | tf |
Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features
Title | Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features |
Authors | Arno Solin, Manon Kok |
Abstract | Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. This paper considers constraining GPs to arbitrarily-shaped domains with boundary conditions. We solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest, which both constrains the GP and attains a low-rank representation that is used for speeding up inference. The method scales as $\mathcal{O}(nm^2)$ in prediction and $\mathcal{O}(m^3)$ in hyperparameter learning for regression, where $n$ is the number of data points and $m$ the number of features. Furthermore, we make use of the variational approach to allow the method to deal with non-Gaussian likelihoods. The experiments cover both simulated and empirical data in which the boundary conditions allow for inclusion of additional physical information. |
Tasks | Gaussian Processes |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05207v1 |
http://arxiv.org/pdf/1904.05207v1.pdf | |
PWC | https://paperswithcode.com/paper/know-your-boundaries-constraining-gaussian |
Repo | https://github.com/AaltoML/boundary-gp |
Framework | none |
daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices
Title | daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices |
Authors | Jianhao Zhang, Yingwei Pan, Ting Yao, He Zhao, Tao Mei |
Abstract | It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations. Nevertheless, there has not been open-source implementation in support of this idea on low-end ARM devices (e.g., mobile phones and embedded devices). In this work, we propose daBNN — a super fast inference framework that implements BNNs on ARM devices. Several speed-up and memory refinement strategies for bit-packing, binarized convolution, and memory layout are uniquely devised to enhance inference efficiency. Compared to the recent open-source BNN inference framework, BMXNet, our daBNN is $7\times$$\sim$$23\times$ faster on a single binary convolution, and about $6\times$ faster on Bi-Real Net 18 (a BNN variant of ResNet-18). The daBNN is a BSD-licensed inference framework, and its source code, sample projects and pre-trained models are available on-line: https://github.com/JDAI-CV/dabnn. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05858v1 |
https://arxiv.org/pdf/1908.05858v1.pdf | |
PWC | https://paperswithcode.com/paper/dabnn-a-super-fast-inference-framework-for |
Repo | https://github.com/JDAI-CV/dabnn |
Framework | none |
A Higher-Order Swiss Army Infinitesimal Jackknife
Title | A Higher-Order Swiss Army Infinitesimal Jackknife |
Authors | Ryan Giordano, Michael I. Jordan, Tamara Broderick |
Abstract | Cross validation (CV) and the bootstrap are ubiquitous model-agnostic tools for assessing the error or variability of machine learning and statistical estimators. However, these methods require repeatedly re-fitting the model with different weighted versions of the original dataset, which can be prohibitively time-consuming. For sufficiently regular optimization problems the optimum depends smoothly on the data weights, and so the process of repeatedly re-fitting can be approximated with a Taylor series that can be often evaluated relatively quickly. The first-order approximation is known as the “infinitesimal jackknife” in the statistics literature and has been the subject of recent interest in machine learning for approximate CV. In this work, we consider high-order approximations, which we call the “higher-order infinitesimal jackknife” (HOIJ). Under mild regularity conditions, we provide a simple recursive procedure to compute approximations of all orders with finite-sample accuracy bounds. Additionally, we show that the HOIJ can be efficiently computed even in high dimensions using forward-mode automatic differentiation. We show that a linear approximation with bootstrap weights approximation is equivalent to those provided by asymptotic normal approximations. Consequently, the HOIJ opens up the possibility of enjoying higher-order accuracy properties of the bootstrap using local approximations. Consistency of the HOIJ for leave-one-out CV under different asymptotic regimes follows as corollaries from our finite-sample bounds under additional regularity assumptions. The generality of the computation and bounds motivate the name “higher-order Swiss Army infinitesimal jackknife.” |
Tasks | |
Published | 2019-07-28 |
URL | https://arxiv.org/abs/1907.12116v1 |
https://arxiv.org/pdf/1907.12116v1.pdf | |
PWC | https://paperswithcode.com/paper/a-higher-order-swiss-army-infinitesimal |
Repo | https://github.com/rgiordan/vittles |
Framework | none |
VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition
Title | VarGFaceNet: An Efficient Variable Group Convolutional Neural Network for Lightweight Face Recognition |
Authors | Mengjia Yan, Mengao Zhao, Zining Xu, Qian Zhang, Guoli Wang, Zhizhong Su |
Abstract | To improve the discriminative and generalization ability of lightweight network for face recognition, we propose an efficient variable group convolutional network called VarGFaceNet. Variable group convolution is introduced by VarGNet to solve the conflict between small computational cost and the unbalance of computational intensity inside a block. We employ variable group convolution to design our network which can support large scale face identification while reduce computational cost and parameters. Specifically, we use a head setting to reserve essential information at the start of the network and propose a particular embedding setting to reduce parameters of fully-connected layer for embedding. To enhance interpretation ability, we employ an equivalence of angular distillation loss to guide our lightweight network and we apply recursive knowledge distillation to relieve the discrepancy between the teacher model and the student model. The champion of deepglint-light track of LFR (2019) challenge demonstrates the effectiveness of our model and approach. Implementation of VarGFaceNet will be released at https://github.com/zma-c-137/VarGFaceNet soon. |
Tasks | Face Detection, Face Identification, Face Recognition |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.04985v4 |
https://arxiv.org/pdf/1910.04985v4.pdf | |
PWC | https://paperswithcode.com/paper/vargfacenet-an-efficient-variable-group |
Repo | https://github.com/zma-c-137/VarGFaceNet |
Framework | mxnet |
Scale-Equivariant Steerable Networks
Title | Scale-Equivariant Steerable Networks |
Authors | Ivan Sosnovik, Michał Szmaja, Arnold Smeulders |
Abstract | The effectiveness of Convolutional Neural Networks (CNNs) has been substantially attributed to their built-in property of translation equivariance. However, CNNs do not have embedded mechanisms to handle other types of transformations. In this work, we pay attention to scale changes, which regularly appear in various tasks due to the changing distances between the objects and the camera. First, we introduce the general theory for building scale-equivariant convolutional networks with steerable filters. We develop scale-convolution and generalize other common blocks to be scale-equivariant. We demonstrate the computational efficiency and numerical stability of the proposed method. We compare the proposed models to the previously developed methods for scale equivariance and local scale invariance. We demonstrate state-of-the-art results on MNIST-scale dataset and on STL-10 dataset in the supervised learning setting. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.11093v2 |
https://arxiv.org/pdf/1910.11093v2.pdf | |
PWC | https://paperswithcode.com/paper/scale-equivariant-steerable-networks-1 |
Repo | https://github.com/ISosnovik/sesn |
Framework | pytorch |
DeOccNet: Learning to See Through Foreground Occlusions in Light Fields
Title | DeOccNet: Learning to See Through Foreground Occlusions in Light Fields |
Authors | Yingqian Wang, Tianhao Wu, Jungang Yang, Longguang Wang, Wei An, Yulan Guo |
Abstract | Background objects occluded in some views of a light field (LF) camera can be seen by other views. Consequently, occluded surfaces are possible to be reconstructed from LF images. In this paper, we handle the LF de-occlusion (LF-DeOcc) problem using a deep encoder-decoder network (namely, DeOccNet). In our method, sub-aperture images (SAIs) are first given to the encoder to incorporate both spatial and angular information. The encoded representations are then used by the decoder to render an occlusionfree center-view SAI. To the best of our knowledge, DeOccNet is the first deep learning-based LF-DeOcc method. To handle the insufficiency of training data, we propose an LF synthesis approach to embed selected occlusion masks into existing LF images. Besides, several synthetic and realworld LFs are developed for performance evaluation. Experimental results show that, after training on the generated data, our DeOccNet can effectively remove foreground occlusions and achieves superior performance as compared to other state-of-the-art methods. Source codes are available at: https://github.com/YingqianWang/DeOccNet. |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04459v1 |
https://arxiv.org/pdf/1912.04459v1.pdf | |
PWC | https://paperswithcode.com/paper/deoccnet-learning-to-see-through-foreground |
Repo | https://github.com/YingqianWang/DeOccNet |
Framework | pytorch |
Image-Based Size Analysis of Agglomerated and Partially Sintered Particles via Convolutional Neural Networks
Title | Image-Based Size Analysis of Agglomerated and Partially Sintered Particles via Convolutional Neural Networks |
Authors | Max Frei, Frank Einar Kruis |
Abstract | There is a high demand for fully automated methods for the analysis of primary particle size distributions of agglomerated, sintered or occluded primary particles, due to their impact on material properties. Therefore, a novel, deep learning-based, method for the detection of such primary particles was proposed and tested, which renders a manual tuning of analysis parameters unnecessary. As a specialty, the training of the utilized convolutional neural networks was carried out using only synthetic images, thereby avoiding the laborious task of manual annotation and increasing the ground truth quality. Nevertheless, the proposed method performs excellent on real world samples of sintered silica nanoparticles with various sintering degrees and varying image conditions. In a direct comparison, the proposed method clearly outperforms two state-of-the-art methods for automated image-based particle size analysis (Hough transformation and the ImageJ ParticleSizer plug-in), thereby attaining human-like performance. |
Tasks | |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05112v3 |
https://arxiv.org/pdf/1907.05112v3.pdf | |
PWC | https://paperswithcode.com/paper/image-based-size-analysis-of-agglomerated-and |
Repo | https://github.com/maxfrei750/DeepParticleNet |
Framework | tf |
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing
Title | GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing |
Authors | Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu |
Abstract | We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage. |
Tasks | |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04433v2 |
https://arxiv.org/pdf/1907.04433v2.pdf | |
PWC | https://paperswithcode.com/paper/gluoncv-and-gluonnlp-deep-learning-in |
Repo | https://github.com/xcgoner/AISTATS2020-AdaAlter-GluonNLP |
Framework | mxnet |
Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
Title | Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML |
Authors | Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Minyi Hong, Una-May Obelilly |
Abstract | In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers. |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13806v1 |
https://arxiv.org/pdf/1909.13806v1.pdf | |
PWC | https://paperswithcode.com/paper/min-max-optimization-without-gradients |
Repo | https://github.com/KaidiXu/ZO-minmax |
Framework | tf |
Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches
Title | Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches |
Authors | Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach |
Abstract | Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today’s research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area. Source code of our experiments and full results are available at: https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation. |
Tasks | Recommendation Systems |
Published | 2019-07-16 |
URL | https://arxiv.org/abs/1907.06902v3 |
https://arxiv.org/pdf/1907.06902v3.pdf | |
PWC | https://paperswithcode.com/paper/are-we-really-making-much-progress-a-worrying |
Repo | https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation |
Framework | none |
HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding
Title | HAHE: Hierarchical Attentive Heterogeneous Information Network Embedding |
Authors | Sheng Zhou, Jiajun Bu, Xin Wang, Jiawei Chen, Can Wang |
Abstract | Heterogeneous information network (HIN) embedding has recently attracted much attention due to its effectiveness in dealing with the complex heterogeneous data. Meta path, which connects different object types with various semantic meanings, is widely used by existing HIN embedding works. However, several challenges have not been addressed so far. First, different meta paths convey different semantic meanings, while existing works assume that all nodes share same weights for meta paths and ignore the personalized preferences of different nodes on different meta paths. Second, given a meta path, nodes in HIN are connected by path instances while existing works fail to fully explore the differences between path instances that reflect nodes’ preferences in the semantic space. rTo tackle the above challenges, we propose aHierarchical Attentive Heterogeneous information network Embedding (HAHE) model to capture the personalized preferences on meta paths and path instances in each semantic space. As path instances are based on a particular meta path, a hierarchical attention mechanism is naturally utilized to model the personalized preference on meta paths and path instances. Extensive experiments on several real-world datasets show that our proposed \model model significantly outperforms the state-of-the-art methods in terms of various data mining tasks. |
Tasks | Network Embedding |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1902.01475v2 |
https://arxiv.org/pdf/1902.01475v2.pdf | |
PWC | https://paperswithcode.com/paper/hahe-hierarchical-attentive-heterogeneous |
Repo | https://github.com/Jhy1993/HAN |
Framework | tf |
Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss
Title | Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss |
Authors | Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu |
Abstract | We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03820v1 |
https://arxiv.org/pdf/1905.03820v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-cross-modal-talking-face |
Repo | https://github.com/lelechen63/ATVGnet |
Framework | pytorch |
Motion Guided Attention for Video Salient Object Detection
Title | Motion Guided Attention for Video Salient Object Detection |
Authors | Haofeng Li, Guanqi Chen, Guanbin Li, Yizhou Yu |
Abstract | Video salient object detection aims at discovering the most visually distinctive objects in a video. How to effectively take object motion into consideration during video salient object detection is a critical issue. Existing state-of-the-art methods either do not explicitly model and harvest motion cues or ignore spatial contexts within optical flow images. In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images. We further introduce a series of novel motion guided attention modules, which utilize the motion saliency sub-network to attend and enhance the sub-network for still images. These two sub-networks learn to adapt to each other by end-to-end training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on a wide range of benchmarks. We hope our simple and effective approach will serve as a solid baseline and help ease future research in video salient object detection. Code and models will be made available. |
Tasks | Object Detection, Optical Flow Estimation, Saliency Detection, Salient Object Detection, Video Salient Object Detection |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07061v2 |
https://arxiv.org/pdf/1909.07061v2.pdf | |
PWC | https://paperswithcode.com/paper/motion-guided-attention-for-video-salient |
Repo | https://github.com/lhaof/Motion-Guided-Attention |
Framework | pytorch |
A Hierarchical Model for Data-to-Text Generation
Title | A Hierarchical Model for Data-to-Text Generation |
Authors | Clément Rebuffel, Laure Soulier, Geoffrey Scoutheeten, Patrick Gallinari |
Abstract | Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as “data-to-text”. These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level. Evaluations on RotoWire show the effectiveness of our model w.r.t. qualitative and quantitative metrics. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.10011v1 |
https://arxiv.org/pdf/1912.10011v1.pdf | |
PWC | https://paperswithcode.com/paper/a-hierarchical-model-for-data-to-text |
Repo | https://github.com/KaijuML/data-to-text-hierarchical |
Framework | pytorch |