January 31, 2020

3033 words 15 mins read

Paper Group ANR 3

High-Performance Deep Learning via a Single Building Block. Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks. A Survey of Deep Learning-based Object Detection. Towards Explainable Deep Neural Networks (xDNN). Building Effective Large-Scale Traffic State Prediction System: Traffic4cast Challenge Solution. …

High-Performance Deep Learning via a Single Building Block


Title	High-Performance Deep Learning via a Single Building Block
Authors	Evangelos Georganas, Kunal Banerjee, Dhiraj Kalamkar, Sasikanth Avancha, Anand Venkat, Michael Anderson, Greg Henry, Hans Pabst, Alexander Heinecke
Abstract	Deep learning (DL) is one of the most prominent branches of machine learning. Due to the immense computational cost of DL workloads, industry and academia have developed DL libraries with highly-specialized kernels for each workload/architecture, leading to numerous, complex code-bases that strive for performance, yet they are hard to maintain and do not generalize. In this work, we introduce the batch-reduce GEMM kernel and show how the most popular DL algorithms can be formulated with this kernel as the basic building-block. Consequently, the DL library-development degenerates to mere (potentially automatic) tuning of loops around this sole optimized kernel. By exploiting our new kernel we implement Recurrent Neural Networks, Convolution Neural Networks and Multilayer Perceptron training and inference primitives in just 3K lines of high-level code. Our primitives outperform vendor-optimized libraries on multi-node CPU clusters, and we also provide proof-of-concept CNN kernels targeting GPUs. Finally, we demonstrate that the batch-reduce GEMM kernel within a tensor compiler yields high-performance CNN primitives, further amplifying the viability of our approach.
Tasks
Published	2019-06-15
URL	https://arxiv.org/abs/1906.06440v2
PDF	https://arxiv.org/pdf/1906.06440v2.pdf
PWC	https://paperswithcode.com/paper/high-performance-deep-learning-via-a-single
Repo
Framework

Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks


Title	Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks
Authors	Maximilian Pöpperl, Raghavendra Gulagundi, Senthil Yogamani, Stefan Milz
Abstract	Recently, realistic data augmentation using neural networks especially generative neural networks (GAN) has achieved outstanding results. The communities main research focus is visual image processing. However, automotive cars and robots are equipped with a large suite of sensors to achieve a high redundancy. In addition to others, ultrasonic sensors are often used due to their low-costs and reliable near field distance measuring capabilities. Hence, Pattern recognition needs to be applied to ultrasonic signals as well. Machine Learning requires extensive data sets and those measurements are time-consuming, expensive and not flexible to hardware and environmental changes. On the other hand, there exists no method to simulate those signals deterministically. We present a novel approach for synthetic ultrasonic signal simulation using conditional GANs (cGANs). For the best of our knowledge, we present the first realistic data augmentation for automotive ultrasonics. The performance of cGANs allows us to bring the realistic environment simulation to a new level. By using setup and environmental parameters as condition, the proposed approach is flexible to external influences. Due to the low complexity and time effort for data generation, we outperform other simulation algorithms, such as finite element method. We verify the outstanding accuracy and realism of our method by applying a detailed statistical analysis and comparing the generated data to an extensive amount of measured signals.
Tasks	Data Augmentation
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09842v1
PDF	http://arxiv.org/pdf/1902.09842v1.pdf
PWC	https://paperswithcode.com/paper/realistic-ultrasonic-environment-simulation
Repo
Framework

A Survey of Deep Learning-based Object Detection


Title	A Survey of Deep Learning-based Object Detection
Authors	Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, Rong Qu
Abstract	Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.
Tasks	Autonomous Driving, Object Detection
Published	2019-07-11
URL	https://arxiv.org/abs/1907.09408v2
PDF	https://arxiv.org/pdf/1907.09408v2.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-deep-learning-based-object
Repo
Framework

Towards Explainable Deep Neural Networks (xDNN)


Title	Towards Explainable Deep Neural Networks (xDNN)
Authors	Plamen Angelov, Eduardo Soares
Abstract	In this paper, we propose an elegant solution that is directly addressing the bottlenecks of the traditional deep learning approaches and offers a clearly explainable internal architecture that can outperform the existing methods, requires very little computational resources (no need for GPUs) and short training times (in the order of seconds). The proposed approach, xDNN is using prototypes. Prototypes are actual training data samples (images), which are local peaks of the empirical data distribution called typicality as well as of the data density. This generative model is identified in a closed form and equates to the pdf but is derived automatically and entirely from the training data with no user- or problem-specific thresholds, parameters or intervention. The proposed xDNN offers a new deep learning architecture that combines reasoning and learning in a synergy. It is non-iterative and non-parametric, which explains its efficiency in terms of time and computational resources. From the user perspective, the proposed approach is clearly understandable to human users. We tested it on some well-known benchmark data sets such as iRoads and Caltech-256. xDNN outperforms the other methods including deep learning in terms of accuracy, time to train and offers a clearly explainable classifier. In fact, the result on the very hard Caltech-256 problem (which has 257 classes) represents a world record.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02523v1
PDF	https://arxiv.org/pdf/1912.02523v1.pdf
PWC	https://paperswithcode.com/paper/191202523
Repo
Framework

Building Effective Large-Scale Traffic State Prediction System: Traffic4cast Challenge Solution


Title	Building Effective Large-Scale Traffic State Prediction System: Traffic4cast Challenge Solution
Authors	Yang Liu, Fanyou Wu, Baosheng Yu, Zhiyuan Liu, Jieping Ye
Abstract	How to build an effective large-scale traffic state prediction system is a challenging but highly valuable problem. This study focuses on the construction of an effective solution designed for spatio-temporal data to predict large-scale traffic state. Considering the large data size in Traffic4cast Challenge and our limited computational resources, we emphasize model design to achieve a relatively high prediction performance within acceptable running time. We adopt a structure similar to U-net and use a mask instead of spatial attention to address the data sparsity. Then, combined with the experience of time series prediction problem, we design a number of features, which are input into the model as different channels. Region cropping is used to decrease the difference between the size of the receptive field and the study area, and the models can be specially optimized for each sub-region. The fusion of interdisciplinary knowledge and experience is an emerging demand in classical traffic research. Several interdisciplinary studies we have been studying are also discussed in the Complementary Challenges. The source codes are available in https://github.com/wufanyou/traffic4cast-TLab.
Tasks	Time Series, Time Series Prediction
Published	2019-11-11
URL	https://arxiv.org/abs/1911.05699v1
PDF	https://arxiv.org/pdf/1911.05699v1.pdf
PWC	https://paperswithcode.com/paper/building-effective-large-scale-traffic-state
Repo
Framework

Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity


Title	Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity
Authors	Miao Liu, Siyu Tang, Yin Li, James Rehg
Abstract	We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals critical information about the future activity. Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using latent variables in our deep model. The predicted motor attention is further used to characterise the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. At the time of submission, our method is ranked first on unseen test set during EPIC-Kitchens Action Anticipation Challenge Phase 2.
Tasks	Human-Object Interaction Detection
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10967v1
PDF	https://arxiv.org/pdf/1911.10967v1.pdf
PWC	https://paperswithcode.com/paper/forecasting-human-object-interaction-joint
Repo
Framework

Deep Contextual Attention for Human-Object Interaction Detection


Title	Deep Contextual Attention for Human-Object Interaction Detection
Authors	Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen
Abstract	Human-object interaction detection is an important and relatively new class of visual relationship detection tasks, essential for deeper scene understanding. Most existing approaches decompose the problem into object localization and interaction recognition. Despite showing progress, these approaches only rely on the appearances of humans and objects and overlook the available context information, crucial for capturing subtle interactions between them. We propose a contextual attention framework for human-object interaction detection. Our approach leverages context by learning contextually-aware appearance features for human and object instances. The proposed attention module then adaptively selects relevant instance-centric context information to highlight image regions likely to contain human-object interactions. Experiments are performed on three benchmarks: V-COCO, HICO-DET and HCVRD. Our approach outperforms the state-of-the-art on all datasets. On the V-COCO dataset, our method achieves a relative gain of 4.4% in terms of role mean average precision ($mAP_{role}$), compared to the existing best approach.
Tasks	Human-Object Interaction Detection, Object Localization, Scene Understanding
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07721v1
PDF	https://arxiv.org/pdf/1910.07721v1.pdf
PWC	https://paperswithcode.com/paper/deep-contextual-attention-for-human-object
Repo
Framework

Multiscale Gaussian Process Level Set Estimation


Title	Multiscale Gaussian Process Level Set Estimation
Authors	Shubhanshu Shekhar, Tara Javidi
Abstract	In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered. A new algorithm for this problem in the Bayesian framework with a Gaussian Process (GP) prior is proposed. The proposed algorithm employs a hierarchical sequence of partitions to explore different regions of the search space at varying levels of detail depending upon their proximity to the level set boundary. It is shown that this approach results in the algorithm having a low complexity implementation whose computational cost is significantly smaller than the existing algorithms for higher dimensional search space $\X$. Furthermore, high probability bounds on a measure of discrepancy between the estimated level set and the true level set for the the proposed algorithm are obtained, which are shown to be strictly better than the existing guarantees for a large class of GPs. In the process, a tighter characterization of the information gain of the proposed algorithm is obtained which takes into account the structured nature of the evaluation points. This approach improves upon the existing technique of bounding the information gain with maximum information gain.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09682v1
PDF	http://arxiv.org/pdf/1902.09682v1.pdf
PWC	https://paperswithcode.com/paper/multiscale-gaussian-process-level-set
Repo
Framework

Fairness without Regret


Title	Fairness without Regret
Authors	Marcus Hutter
Abstract	A popular approach of achieving fairness in optimization problems is by constraining the solution space to “fair” solutions, which unfortunately typically reduces solution quality. In practice, the ultimate goal is often an aggregate of sub-goals without a unique or best way of combining them or which is otherwise only partially known. I turn this problem into a feature and suggest to use a parametrized objective and vary the parameters within reasonable ranges to get a “set” of optimal solutions, which can then be optimized using secondary criteria such as fairness without compromising the primary objective, i.e. without regret (societal cost).
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05159v1
PDF	https://arxiv.org/pdf/1907.05159v1.pdf
PWC	https://paperswithcode.com/paper/fairness-without-regret
Repo
Framework

PPO Dash: Improving Generalization in Deep Reinforcement Learning


Title	PPO Dash: Improving Generalization in Deep Reinforcement Learning
Authors	Joe Booth
Abstract	Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.
Tasks
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06704v3
PDF	https://arxiv.org/pdf/1907.06704v3.pdf
PWC	https://paperswithcode.com/paper/ppo-dash-improving-generalization-in-deep
Repo
Framework

Let’s Push Things Forward: A Survey on Robot Pushing


Title	Let’s Push Things Forward: A Survey on Robot Pushing
Authors	Jochen Stüber, Claudio Zito, Rustam Stolkin
Abstract	As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot’s manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.
Tasks
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05138v1
PDF	https://arxiv.org/pdf/1905.05138v1.pdf
PWC	https://paperswithcode.com/paper/lets-push-things-forward-a-survey-on-robot
Repo
Framework

The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review


Title	The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review
Authors	Rory Bunker, Teo Susnjak
Abstract	Over the past two decades, Machine Learning (ML) techniques have been increasingly utilized for the purpose of predicting outcomes in sport. In this paper, we provide a review of studies that have used ML for predicting results in team sport, covering studies from 1996 to 2019. We sought to answer five key research questions while extensively surveying papers in this field. This paper offers insights into which ML algorithms have tended to be used in this field, as well as those that are beginning to emerge with successful outcomes. Our research highlights defining characteristics of successful studies and identifies robust strategies for evaluating accuracy results in this application domain. Our study considers accuracies that have been achieved across different sports and explores the notion that outcomes of some team sports could be inherently more difficult to predict than others. Finally, our study uncovers common themes of future research directions across all surveyed papers, looking for gaps and opportunities, while proposing recommendations for future researchers in this domain.
Tasks
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11762v1
PDF	https://arxiv.org/pdf/1912.11762v1.pdf
PWC	https://paperswithcode.com/paper/the-application-of-machine-learning
Repo
Framework

Provably scale-covariant networks from oriented quasi quadrature measures in cascade


Title	Provably scale-covariant networks from oriented quasi quadrature measures in cascade
Authors	Tony Lindeberg
Abstract	This article presents a continuous model for hierarchical networks based on a combination of mathematically derived models of receptive fields and biologically inspired computations. Based on a functional model of complex cells in terms of an oriented quasi quadrature combination of first- and second-order directional Gaussian derivatives, we couple such primitive computations in cascade over combinatorial expansions over image orientations. Scale-space properties of the computational primitives are analysed and it is shown that the resulting representation allows for provable scale and rotation covariance. A prototype application to texture analysis is developed and it is demonstrated that a simplified mean-reduced representation of the resulting QuasiQuadNet leads to promising experimental results on three texture datasets.
Tasks	Texture Classification
Published	2019-03-01
URL	https://arxiv.org/abs/1903.00289v2
PDF	https://arxiv.org/pdf/1903.00289v2.pdf
PWC	https://paperswithcode.com/paper/provably-scale-covariant-networks-from
Repo
Framework

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation


Title	Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation
Authors	Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin
Abstract	Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains. This minimization can be achieved via a domain classifier to detect target-domain features that are divergent from source-domain features. However, by optimizing via such domain classification discrepancy, ambiguous target samples that are not smoothly distributed on the low-dimensional data manifold are often missed. To solve this issue, we propose a novel Contrastively Smoothed Class Alignment (CoSCA) model, that explicitly incorporates both intra- and inter-class domain discrepancy to better align ambiguous target samples with the source domain. CoSCA estimates the underlying label hypothesis of target samples, and simultaneously adapts their feature representations by optimizing a proposed contrastive loss. In addition, Maximum Mean Discrepancy (MMD) is utilized to directly match features between source and target samples for better global alignment. Experiments on several benchmark datasets demonstrate that CoSCA can outperform state-of-the-art approaches for unsupervised domain adaptation by producing more discriminative features.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05288v3
PDF	https://arxiv.org/pdf/1909.05288v3.pdf
PWC	https://paperswithcode.com/paper/contrastively-smoothed-class-alignment-for
Repo
Framework

Automatic Long-Term Deception Detection in Group Interaction Videos


Title	Automatic Long-Term Deception Detection in Group Interaction Videos
Authors	Chongyang Bai, Maksim Bolonkin, Judee Burgoon, Chao Chen, Norah Dunbar, Bharat Singh, V. S. Subrahmanian, Zhe Wu
Abstract	Most work on automated deception detection (ADD) in video has two restrictions: (i) it focuses on a video of one person, and (ii) it focuses on a single act of deception in a one or two minute video. In this paper, we propose a new ADD framework which captures long term deception in a group setting. We study deception in the well-known Resistance game (like Mafia and Werewolf) which consists of 5-8 players of whom 2-3 are spies. Spies are deceptive throughout the game (typically 30-65 minutes) to keep their identity hidden. We develop an ensemble predictive model to identify spies in Resistance videos. We show that features from low-level and high-level video analysis are insufficient, but when combined with a new class of features that we call LiarRank, produce the best results. We achieve AUCs of over 0.70 in a fully automated setting. Our demo can be found at http://home.cs.dartmouth.edu/~mbolonkin/scan/demo/
Tasks	Deception Detection
Published	2019-05-15
URL	https://arxiv.org/abs/1905.08617v2
PDF	https://arxiv.org/pdf/1905.08617v2.pdf
PWC	https://paperswithcode.com/paper/190508617
Repo
Framework