Paper Group ANR 379
Combining 3D Model Contour Energy and Keypoints for Object Tracking. Pose-Assisted Multi-Camera Collaboration for Active Object Tracking. GAN Compression: Efficient Architectures for Interactive Conditional GANs. Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks. Conditional Variational …
Combining 3D Model Contour Energy and Keypoints for Object Tracking
Title | Combining 3D Model Contour Energy and Keypoints for Object Tracking |
Authors | Bogdan Bugaev, Anton Kryshchenko, Roman Belov |
Abstract | We present a new combined approach for monocular model-based 3D tracking. A preliminary object pose is estimated by using a keypoint-based technique. The pose is then refined by optimizing the contour energy function. The energy determines the degree of correspondence between the contour of the model projection and the image edges. It is calculated based on both the intensity and orientation of the raw image gradient. For optimization, we propose a technique and search area constraints that allow overcoming the local optima and taking into account information obtained through keypoint-based pose estimation. Owing to its combined nature, our method eliminates numerous issues of keypoint-based and edge-based approaches. We demonstrate the efficiency of our method by comparing it with state-of-the-art methods on a public benchmark dataset that includes videos with various lighting conditions, movement patterns, and speed. |
Tasks | Object Tracking, Pose Estimation |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01379v1 |
https://arxiv.org/pdf/2002.01379v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-3d-model-contour-energy-and-1 |
Repo | |
Framework | |
Pose-Assisted Multi-Camera Collaboration for Active Object Tracking
Title | Pose-Assisted Multi-Camera Collaboration for Active Object Tracking |
Authors | Jing Li, Jing Xu, Fangwei Zhong, Xiangyu Kong, Yu Qiao, Yizhou Wang |
Abstract | Active Object Tracking (AOT) is crucial to many visionbased applications, e.g., mobile robot, intelligent surveillance. However, there are a number of challenges when deploying active tracking in complex scenarios, e.g., target is frequently occluded by obstacles. In this paper, we extend the single-camera AOT to a multi-camera setting, where cameras tracking a target in a collaborative fashion. To achieve effective collaboration among cameras, we propose a novel Pose-Assisted Multi-Camera Collaboration System, which enables a camera to cooperate with the others by sharing camera poses for active object tracking. In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images. The pose-based controller moves the camera in accordance to the poses of the other cameras. At each step, the switcher decides which action to take from the two controllers according to the visibility of the target. The experimental results demonstrate that our system outperforms all the baselines and is capable of generalizing to unseen environments. The code and demo videos are available on our website https://sites.google.com/view/pose-assistedcollaboration. |
Tasks | Object Tracking |
Published | 2020-01-15 |
URL | https://arxiv.org/abs/2001.05161v1 |
https://arxiv.org/pdf/2001.05161v1.pdf | |
PWC | https://paperswithcode.com/paper/pose-assisted-multi-camera-collaboration-for |
Repo | |
Framework | |
GAN Compression: Efficient Architectures for Interactive Conditional GANs
Title | GAN Compression: Efficient Architectures for Interactive Conditional GANs |
Authors | Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han |
Abstract | Conditional Generative Adversarial Networks (cGANs) have enabled controllable image synthesis for many computer vision and graphics applications. However, recent cGANs are 1-2 orders of magnitude more computationally-intensive than modern recognition CNNs. For example, GauGAN consumes 281G MACs per image, compared to 0.44G MACs for MobileNet-v3, making it difficult for interactive deployment. In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. Directly applying existing CNNs compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures. We address these challenges in two ways. First, to stabilize the GAN training, we transfer knowledge of multiple intermediate representations of the original model to its compressed model, and unify unpaired and paired learning. Second, instead of reusing existing CNN designs, our method automatically finds efficient architectures via neural architecture search (NAS). To accelerate the search process, we decouple the model training and architecture search via weight sharing. Experiments demonstrate the effectiveness of our method across different supervision settings (paired and unpaired), model architectures, and learning methods (e.g., pix2pix, GauGAN, CycleGAN). Without losing image quality, we reduce the computation of CycleGAN by more than 20X and GauGAN by 9X, paving the way for interactive image synthesis. The code and demo are publicly available. |
Tasks | Image Generation, Neural Architecture Search |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.08936v1 |
https://arxiv.org/pdf/2003.08936v1.pdf | |
PWC | https://paperswithcode.com/paper/gan-compression-efficient-architectures-for |
Repo | |
Framework | |
Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks
Title | Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks |
Authors | Oliver Willers, Sebastian Sudholt, Shervin Raafatnia, Stephanie Abrecht |
Abstract | Deep learning methods are widely regarded as indispensable when it comes to designing perception pipelines for autonomous agents such as robots, drones or automated vehicles. The main reasons, however, for deep learning not being used for autonomous agents at large scale already are safety concerns. Deep learning approaches typically exhibit a black-box behavior which makes it hard for them to be evaluated with respect to safety-critical aspects. While there have been some work on safety in deep learning, most papers typically focus on high-level safety concerns. In this work, we seek to dive into the safety concerns of deep learning methods and present a concise enumeration on a deeply technical level. Additionally, we present extensive discussions on possible mitigation methods and give an outlook regarding what mitigation methods are still missing in order to facilitate an argumentation for the safety of a deep learning method. |
Tasks | |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.08001v1 |
https://arxiv.org/pdf/2001.08001v1.pdf | |
PWC | https://paperswithcode.com/paper/safety-concerns-and-mitigation-approaches |
Repo | |
Framework | |
Conditional Variational Inference with Adaptive Truncation for Bayesian Nonparametric Models
Title | Conditional Variational Inference with Adaptive Truncation for Bayesian Nonparametric Models |
Authors | Jones Yirui Liu, Xinghao Qiao |
Abstract | The scalable inference for Bayesian nonparametric models with big data is still challenging. Current variational inference methods fail to characterise the correlation structure among latent variables due to the mean-field setting and cannot infer the true posterior dimension because of the universal truncation. To overcome these limitations, we build a general framework to infer Bayesian nonparametric models by maximising the proposed nonparametric evidence lower bound, and then develop a novel approach by combining Monte Carlo sampling and stochastic variational inference framework. Our method has several advantages over the traditional online variational inference method. First, it achieves a smaller divergence between variational distributions and the true posterior by factorising variational distributions under the conditional setting instead of the mean-field setting to capture the correlation pattern. Second, it reduces the risk of underfitting or overfitting by truncating the dimension adaptively rather than using a prespecified truncated dimension for all latent variables. Third, it reduces the computational complexity by approximating the posterior functionally instead of updating the stick-breaking parameters individually. We apply the proposed method on hierarchical Dirichlet process and gamma–Dirichlet process models, two essential Bayesian nonparametric models in topic analysis. The empirical study on three large datasets including arXiv, New York Times and Wikipedia reveals that our proposed method substantially outperforms its competitor in terms of lower perplexity and much clearer topic-words clustering. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04508v1 |
https://arxiv.org/pdf/2001.04508v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-variational-inference-with |
Repo | |
Framework | |
AdaEnsemble Learning Approach for Metro Passenger Flow Forecasting
Title | AdaEnsemble Learning Approach for Metro Passenger Flow Forecasting |
Authors | Shaolong Sun, Dongchuan Yang, Ju-e Guo, Shouyang Wang |
Abstract | Accurate and timely metro passenger flow forecasting is critical for the successful deployment of intelligent transportation systems. However, it is quite challenging to propose an efficient and robust forecasting approach due to the inherent randomness and variations of metro passenger flow. In this study, we present a novel adaptive ensemble (AdaEnsemble) learning approach to accurately forecast the volume of metro passenger flows, and it combines the complementary advantages of variational mode decomposition (VMD), seasonal autoregressive integrated moving averaging (SARIMA), multilayer perceptron network (MLP) and long short-term memory (LSTM) network. The AdaEnsemble learning approach consists of three important stages. The first stage applies VMD to decompose the metro passenger flows data into periodic component, deterministic component and volatility component. Then we employ SARIMA model to forecast the periodic component, LSTM network to learn and forecast deterministic component and MLP network to forecast volatility component. In the last stage, the diverse forecasted components are reconstructed by another MLP network. The empirical results show that our proposed AdaEnsemble learning approach not only has the best forecasting performance compared with the state-of-the-art models but also appears to be the most promising and robust based on the historical passenger flow data in Shenzhen subway system and several standard evaluation measures. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07575v2 |
https://arxiv.org/pdf/2002.07575v2.pdf | |
PWC | https://paperswithcode.com/paper/adaensemble-learning-approach-for-metro |
Repo | |
Framework | |
Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking
Title | Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking |
Authors | Samuel Broscheit |
Abstract | A typical architecture for end-to-end entity linking systems consists of three steps: mention detection, candidate generation and entity disambiguation. In this study we investigate the following questions: (a) Can all those steps be learned jointly with a model for contextualized text-representations, i.e. BERT (Devlin et al., 2019)? (b) How much entity knowledge is already contained in pretrained BERT? (c) Does additional entity knowledge improve BERT’s performance in downstream tasks? To this end, we propose an extreme simplification of the entity linking setup that works surprisingly well: simply cast it as a per token classification over the entire entity vocabulary (over 700K classes in our case). We show on an entity linking benchmark that (i) this model improves the entity representations over plain BERT, (ii) that it outperforms entity linking architectures that optimize the tasks separately and (iii) that it only comes second to the current state-of-the-art that does mention detection and entity disambiguation jointly. Additionally, we investigate the usefulness of entity-aware token-representations in the text-understanding benchmark GLUE, as well as the question answering benchmarks SQUAD V2 and SWAG and also the EN-DE WMT14 machine translation benchmark. To our surprise, we find that most of those benchmarks do not benefit from additional entity knowledge, except for a task with very small training data, the RTE task in GLUE, which improves by 2%. |
Tasks | Entity Disambiguation, Entity Linking, Machine Translation, Question Answering |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05473v1 |
https://arxiv.org/pdf/2003.05473v1.pdf | |
PWC | https://paperswithcode.com/paper/investigating-entity-knowledge-in-bert-with-1 |
Repo | |
Framework | |
The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources
Title | The STEM-ECR Dataset: Grounding Scientific Entity References in STEM Scholarly Content to Authoritative Encyclopedic and Lexicographic Sources |
Authors | Jennifer D’Souza, Anett Hoppe, Arthur Brack, Mohamad Yaser Jaradeh, Sören Auer, Ralph Ewerth |
Abstract | We introduce the STEM (Science, Technology, Engineering, and Medicine) Dataset for Scientific Entity Extraction, Classification, and Resolution, version 1.0 (STEM-ECR v1.0). The STEM-ECR v1.0 dataset has been developed to provide a benchmark for the evaluation of scientific entity extraction, classification, and resolution tasks in a domain-independent fashion. It comprises abstracts in 10 STEM disciplines that were found to be the most prolific ones on a major publishing platform. We describe the creation of such a multidisciplinary corpus and highlight the obtained findings in terms of the following features: 1) a generic conceptual formalism for scientific entities in a multidisciplinary scientific context; 2) the feasibility of the domain-independent human annotation of scientific entities under such a generic formalism; 3) a performance benchmark obtainable for automatic extraction of multidisciplinary scientific entities using BERT-based neural models; 4) a delineated 3-step entity resolution procedure for human annotation of the scientific entities via encyclopedic entity linking and lexicographic word sense disambiguation; and 5) human evaluations of Babelfy returned encyclopedic links and lexicographic senses for our entities. Our findings cumulatively indicate that human annotation and automatic learning of multidisciplinary scientific concepts as well as their semantic disambiguation in a wide-ranging setting as STEM is reasonable. |
Tasks | Entity Extraction, Entity Linking, Entity Resolution, Word Sense Disambiguation |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.01006v3 |
https://arxiv.org/pdf/2003.01006v3.pdf | |
PWC | https://paperswithcode.com/paper/the-stem-ecr-dataset-grounding-scientific |
Repo | |
Framework | |
Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior
Title | Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior |
Authors | Hu Zhang, Linchao Zhu, Yi Zhu, Yi Yang |
Abstract | Deep neural networks are known to be susceptible to adversarial noise, which are tiny and imperceptible perturbations. Most of previous work on adversarial attack mainly focus on image models, while the vulnerability of video models is less explored. In this paper, we aim to attack video models by utilizing intrinsic movement pattern and regional relative motion among video frames. We propose an effective motion-excited sampler to obtain motion-aware noise prior, which we term as sparked prior. Our sparked prior underlines frame correlations and utilizes video dynamics via relative motion. By using the sparked prior in gradient estimation, we can successfully attack a variety of video classification models with fewer number of queries. Extensive experimental results on four benchmark datasets validate the efficacy of our proposed method. |
Tasks | Adversarial Attack, Video Classification |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07637v1 |
https://arxiv.org/pdf/2003.07637v1.pdf | |
PWC | https://paperswithcode.com/paper/motion-excited-sampler-video-adversarial |
Repo | |
Framework | |
End-to-End Entity Linking and Disambiguation leveraging Word and Knowledge Graph Embeddings
Title | End-to-End Entity Linking and Disambiguation leveraging Word and Knowledge Graph Embeddings |
Authors | Rostislav Nedelchev, Debanjan Chaudhuri, Jens Lehmann, Asja Fischer |
Abstract | Entity linking - connecting entity mentions in a natural language utterance to knowledge graph (KG) entities is a crucial step for question answering over KGs. It is often based on measuring the string similarity between the entity label and its mention in the question. The relation referred to in the question can help to disambiguate between entities with the same label. This can be misleading if an incorrect relation has been identified in the relation linking step. However, an incorrect relation may still be semantically similar to the relation in which the correct entity forms a triple within the KG; which could be captured by the similarity of their KG embeddings. Based on this idea, we propose the first end-to-end neural network approach that employs KG as well as word embeddings to perform joint relation and entity classification of simple questions while implicitly performing entity disambiguation with the help of a novel gating mechanism. An empirical evaluation shows that the proposed approach achieves a performance comparable to state-of-the-art entity linking while requiring less post-processing. |
Tasks | Entity Disambiguation, Entity Linking, Knowledge Graph Embeddings, Question Answering, Word Embeddings |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.11143v1 |
https://arxiv.org/pdf/2002.11143v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-entity-linking-and-disambiguation |
Repo | |
Framework | |
Improving Entity Linking by Modeling Latent Entity Type Information
Title | Improving Entity Linking by Modeling Latent Entity Type Information |
Authors | Shuang Chen, Jinpeng Wang, Feng Jiang, Chin-Yew Lin |
Abstract | Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline. |
Tasks | Entity Embeddings, Entity Linking, Word Embeddings |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01447v1 |
https://arxiv.org/pdf/2001.01447v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-entity-linking-by-modeling-latent-2 |
Repo | |
Framework | |
Efficient algorithm for calculating transposed PSF matrices for 3D light field deconvolution
Title | Efficient algorithm for calculating transposed PSF matrices for 3D light field deconvolution |
Authors | Martin Eberhart |
Abstract | Volume reconstruction by 3D light field deconvolution is a technique that has been successfully demonstrated for microscopic images recorded by a plenoptic camera. This method requires to compute a transposed version of the 5D matrix that holds the point spread function (PSF) of the optical system. For high resolution cameras with hexagonal microlens arrays this is a very time consuming step. This paper illustrates the significance and the construction of this special matrix and presents an efficient algorithm for its computation, which is based on the distinct relation of the corresponding indices within the original and the transposed matrix. The required computation time is, compared to previously published algorithms, significantly shorter. |
Tasks | |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09133v1 |
https://arxiv.org/pdf/2003.09133v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-algorithm-for-calculating |
Repo | |
Framework | |
Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models
Title | Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models |
Authors | Andreas Krug, Sebastian Stober |
Abstract | Deep Learning based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, introspection methods have been proposed. Adapting such techniques from computer vision to speech recognition is not straight-forward, because speech data is more complex and less interpretable than image data. In this work, we introduce Gradient-adjusted Neuron Activation Profiles (GradNAPs) as means to interpret features and representations in Deep Neural Networks. GradNAPs are characteristic responses of ANNs to particular groups of inputs, which incorporate the relevance of neurons for prediction. We show how to utilize GradNAPs to gain insight about how data is processed in ANNs. This includes different ways of visualizing features and clustering of GradNAPs to compare embeddings of different groups of inputs in any layer of a given network. We demonstrate our proposed techniques using a fully-convolutional ASR model. |
Tasks | Speech Recognition |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08125v1 |
https://arxiv.org/pdf/2002.08125v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-adjusted-neuron-activation-profiles |
Repo | |
Framework | |
Denoising IMU Gyroscopes with Deep Learning for Open-Loop Attitude Estimation
Title | Denoising IMU Gyroscopes with Deep Learning for Open-Loop Attitude Estimation |
Authors | Martin Brossard, Silvere Bonnabel, Axel Barrau |
Abstract | This paper proposes a learning method for denois-ing gyroscopes of Inertial Measurement Units (IMUs) using ground truth data, to estimate in real time the orientation (attitude) of a robot in dead reckoning. The obtained algorithm outperforms the state-of-the-art on the (unseen) test sequences. The obtained performances are achieved thanks to a well chosen model, a proper loss function for orientation increments, and through the identification of key points when training with high-frequency inertial data. Our approach builds upon a neural network based on dilated convolutions, without requiring any recurrent neural network. We demonstrate how efficient our strategy is for 3D attitude estimation on the EuRoC and TUM-VI datasets. Interestingly, we observe our dead reckoning algorithm manages to beat top-ranked visual-inertial odometry systems in terms of attitude estimation although it does not use vision sensors. We believe this paper offers new perspectives for visual-inertial localization and constitutes a step toward more efficient learning methods involving IMUs. Our open-source implementation is available at https://github.com/ mbrossar/denoise-imu-gyro. |
Tasks | Denoising |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10718v1 |
https://arxiv.org/pdf/2002.10718v1.pdf | |
PWC | https://paperswithcode.com/paper/denoising-imu-gyroscopes-with-deep-learning |
Repo | |
Framework | |
Probabilistic Performance-Pattern Decomposition (PPPD): analysis framework and applications to stochastic mechanical systems
Title | Probabilistic Performance-Pattern Decomposition (PPPD): analysis framework and applications to stochastic mechanical systems |
Authors | Ziqi Wang, Marco Broccardo, Junho Song |
Abstract | Since the early 1900s, numerous research efforts have been devoted to developing quantitative solutions to stochastic mechanical systems. In general, the problem is perceived as solved when a complete or partial probabilistic description on the quantity of interest (QoI) is determined. However, in the presence of complex system behavior, there is a critical need to go beyond mere probabilistic descriptions. In fact, to gain a full understanding of the system, it is crucial to extract physical characterizations from the probabilistic structure of the QoI, especially when the QoI solution is obtained in a data-driven fashion. Motivated by this perspective, the paper proposes a framework to obtain structuralized characterizations on behaviors of stochastic systems. The framework is named Probabilistic Performance-Pattern Decomposition (PPPD). PPPD analysis aims to decompose complex response behaviors, conditional to a prescribed performance state, into meaningful patterns in the space of system responses, and to investigate how the patterns are triggered in the space of basic random variables. To illustrate the application of PPPD, the paper studies three numerical examples: 1) an illustrative example with hypothetical stochastic processes input and output; 2) a stochastic Lorenz system with periodic as well as chaotic behaviors; and 3) a simplified shear-building model subjected to a stochastic ground motion excitation. |
Tasks | |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.02205v1 |
https://arxiv.org/pdf/2003.02205v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-performance-pattern |
Repo | |
Framework | |