January 31, 2020

3033 words 15 mins read

Paper Group ANR 3

Paper Group ANR 3

High-Performance Deep Learning via a Single Building Block. Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks. A Survey of Deep Learning-based Object Detection. Towards Explainable Deep Neural Networks (xDNN). Building Effective Large-Scale Traffic State Prediction System: Traffic4cast Challenge Solution. …

High-Performance Deep Learning via a Single Building Block

Title High-Performance Deep Learning via a Single Building Block
Authors Evangelos Georganas, Kunal Banerjee, Dhiraj Kalamkar, Sasikanth Avancha, Anand Venkat, Michael Anderson, Greg Henry, Hans Pabst, Alexander Heinecke
Abstract Deep learning (DL) is one of the most prominent branches of machine learning. Due to the immense computational cost of DL workloads, industry and academia have developed DL libraries with highly-specialized kernels for each workload/architecture, leading to numerous, complex code-bases that strive for performance, yet they are hard to maintain and do not generalize. In this work, we introduce the batch-reduce GEMM kernel and show how the most popular DL algorithms can be formulated with this kernel as the basic building-block. Consequently, the DL library-development degenerates to mere (potentially automatic) tuning of loops around this sole optimized kernel. By exploiting our new kernel we implement Recurrent Neural Networks, Convolution Neural Networks and Multilayer Perceptron training and inference primitives in just 3K lines of high-level code. Our primitives outperform vendor-optimized libraries on multi-node CPU clusters, and we also provide proof-of-concept CNN kernels targeting GPUs. Finally, we demonstrate that the batch-reduce GEMM kernel within a tensor compiler yields high-performance CNN primitives, further amplifying the viability of our approach.
Tasks
Published 2019-06-15
URL https://arxiv.org/abs/1906.06440v2
PDF https://arxiv.org/pdf/1906.06440v2.pdf
PWC https://paperswithcode.com/paper/high-performance-deep-learning-via-a-single
Repo
Framework

Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks

Title Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks
Authors Maximilian Pöpperl, Raghavendra Gulagundi, Senthil Yogamani, Stefan Milz
Abstract Recently, realistic data augmentation using neural networks especially generative neural networks (GAN) has achieved outstanding results. The communities main research focus is visual image processing. However, automotive cars and robots are equipped with a large suite of sensors to achieve a high redundancy. In addition to others, ultrasonic sensors are often used due to their low-costs and reliable near field distance measuring capabilities. Hence, Pattern recognition needs to be applied to ultrasonic signals as well. Machine Learning requires extensive data sets and those measurements are time-consuming, expensive and not flexible to hardware and environmental changes. On the other hand, there exists no method to simulate those signals deterministically. We present a novel approach for synthetic ultrasonic signal simulation using conditional GANs (cGANs). For the best of our knowledge, we present the first realistic data augmentation for automotive ultrasonics. The performance of cGANs allows us to bring the realistic environment simulation to a new level. By using setup and environmental parameters as condition, the proposed approach is flexible to external influences. Due to the low complexity and time effort for data generation, we outperform other simulation algorithms, such as finite element method. We verify the outstanding accuracy and realism of our method by applying a detailed statistical analysis and comparing the generated data to an extensive amount of measured signals.
Tasks Data Augmentation
Published 2019-02-26
URL http://arxiv.org/abs/1902.09842v1
PDF http://arxiv.org/pdf/1902.09842v1.pdf
PWC https://paperswithcode.com/paper/realistic-ultrasonic-environment-simulation
Repo
Framework

A Survey of Deep Learning-based Object Detection

Title A Survey of Deep Learning-based Object Detection
Authors Licheng Jiao, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, Rong Qu
Abstract Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.
Tasks Autonomous Driving, Object Detection
Published 2019-07-11
URL https://arxiv.org/abs/1907.09408v2
PDF https://arxiv.org/pdf/1907.09408v2.pdf
PWC https://paperswithcode.com/paper/a-survey-of-deep-learning-based-object
Repo
Framework

Towards Explainable Deep Neural Networks (xDNN)

Title Towards Explainable Deep Neural Networks (xDNN)
Authors Plamen Angelov, Eduardo Soares
Abstract In this paper, we propose an elegant solution that is directly addressing the bottlenecks of the traditional deep learning approaches and offers a clearly explainable internal architecture that can outperform the existing methods, requires very little computational resources (no need for GPUs) and short training times (in the order of seconds). The proposed approach, xDNN is using prototypes. Prototypes are actual training data samples (images), which are local peaks of the empirical data distribution called typicality as well as of the data density. This generative model is identified in a closed form and equates to the pdf but is derived automatically and entirely from the training data with no user- or problem-specific thresholds, parameters or intervention. The proposed xDNN offers a new deep learning architecture that combines reasoning and learning in a synergy. It is non-iterative and non-parametric, which explains its efficiency in terms of time and computational resources. From the user perspective, the proposed approach is clearly understandable to human users. We tested it on some well-known benchmark data sets such as iRoads and Caltech-256. xDNN outperforms the other methods including deep learning in terms of accuracy, time to train and offers a clearly explainable classifier. In fact, the result on the very hard Caltech-256 problem (which has 257 classes) represents a world record.
Tasks
Published 2019-12-05
URL https://arxiv.org/abs/1912.02523v1
PDF https://arxiv.org/pdf/1912.02523v1.pdf
PWC https://paperswithcode.com/paper/191202523
Repo
Framework

Building Effective Large-Scale Traffic State Prediction System: Traffic4cast Challenge Solution

Title Building Effective Large-Scale Traffic State Prediction System: Traffic4cast Challenge Solution
Authors Yang Liu, Fanyou Wu, Baosheng Yu, Zhiyuan Liu, Jieping Ye
Abstract How to build an effective large-scale traffic state prediction system is a challenging but highly valuable problem. This study focuses on the construction of an effective solution designed for spatio-temporal data to predict large-scale traffic state. Considering the large data size in Traffic4cast Challenge and our limited computational resources, we emphasize model design to achieve a relatively high prediction performance within acceptable running time. We adopt a structure similar to U-net and use a mask instead of spatial attention to address the data sparsity. Then, combined with the experience of time series prediction problem, we design a number of features, which are input into the model as different channels. Region cropping is used to decrease the difference between the size of the receptive field and the study area, and the models can be specially optimized for each sub-region. The fusion of interdisciplinary knowledge and experience is an emerging demand in classical traffic research. Several interdisciplinary studies we have been studying are also discussed in the Complementary Challenges. The source codes are available in https://github.com/wufanyou/traffic4cast-TLab.
Tasks Time Series, Time Series Prediction
Published 2019-11-11
URL https://arxiv.org/abs/1911.05699v1
PDF https://arxiv.org/pdf/1911.05699v1.pdf
PWC https://paperswithcode.com/paper/building-effective-large-scale-traffic-state
Repo
Framework

Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity

Title Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity
Authors Miao Liu, Siyu Tang, Yin Li, James Rehg
Abstract We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals critical information about the future activity. Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using latent variables in our deep model. The predicted motor attention is further used to characterise the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. At the time of submission, our method is ranked first on unseen test set during EPIC-Kitchens Action Anticipation Challenge Phase 2.
Tasks Human-Object Interaction Detection
Published 2019-11-25
URL https://arxiv.org/abs/1911.10967v1
PDF https://arxiv.org/pdf/1911.10967v1.pdf
PWC https://paperswithcode.com/paper/forecasting-human-object-interaction-joint
Repo
Framework

Deep Contextual Attention for Human-Object Interaction Detection

Title Deep Contextual Attention for Human-Object Interaction Detection
Authors Tiancai Wang, Rao Muhammad Anwer, Muhammad Haris Khan, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, Jorma Laaksonen
Abstract Human-object interaction detection is an important and relatively new class of visual relationship detection tasks, essential for deeper scene understanding. Most existing approaches decompose the problem into object localization and interaction recognition. Despite showing progress, these approaches only rely on the appearances of humans and objects and overlook the available context information, crucial for capturing subtle interactions between them. We propose a contextual attention framework for human-object interaction detection. Our approach leverages context by learning contextually-aware appearance features for human and object instances. The proposed attention module then adaptively selects relevant instance-centric context information to highlight image regions likely to contain human-object interactions. Experiments are performed on three benchmarks: V-COCO, HICO-DET and HCVRD. Our approach outperforms the state-of-the-art on all datasets. On the V-COCO dataset, our method achieves a relative gain of 4.4% in terms of role mean average precision ($mAP_{role}$), compared to the existing best approach.
Tasks Human-Object Interaction Detection, Object Localization, Scene Understanding
Published 2019-10-17
URL https://arxiv.org/abs/1910.07721v1
PDF https://arxiv.org/pdf/1910.07721v1.pdf
PWC https://paperswithcode.com/paper/deep-contextual-attention-for-human-object
Repo
Framework

Multiscale Gaussian Process Level Set Estimation

Title Multiscale Gaussian Process Level Set Estimation
Authors Shubhanshu Shekhar, Tara Javidi
Abstract In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered. A new algorithm for this problem in the Bayesian framework with a Gaussian Process (GP) prior is proposed. The proposed algorithm employs a hierarchical sequence of partitions to explore different regions of the search space at varying levels of detail depending upon their proximity to the level set boundary. It is shown that this approach results in the algorithm having a low complexity implementation whose computational cost is significantly smaller than the existing algorithms for higher dimensional search space $\X$. Furthermore, high probability bounds on a measure of discrepancy between the estimated level set and the true level set for the the proposed algorithm are obtained, which are shown to be strictly better than the existing guarantees for a large class of GPs. In the process, a tighter characterization of the information gain of the proposed algorithm is obtained which takes into account the structured nature of the evaluation points. This approach improves upon the existing technique of bounding the information gain with maximum information gain.
Tasks
Published 2019-02-26
URL http://arxiv.org/abs/1902.09682v1
PDF http://arxiv.org/pdf/1902.09682v1.pdf
PWC https://paperswithcode.com/paper/multiscale-gaussian-process-level-set
Repo
Framework

Fairness without Regret

Title Fairness without Regret
Authors Marcus Hutter
Abstract A popular approach of achieving fairness in optimization problems is by constraining the solution space to “fair” solutions, which unfortunately typically reduces solution quality. In practice, the ultimate goal is often an aggregate of sub-goals without a unique or best way of combining them or which is otherwise only partially known. I turn this problem into a feature and suggest to use a parametrized objective and vary the parameters within reasonable ranges to get a “set” of optimal solutions, which can then be optimized using secondary criteria such as fairness without compromising the primary objective, i.e. without regret (societal cost).
Tasks
Published 2019-07-11
URL https://arxiv.org/abs/1907.05159v1
PDF https://arxiv.org/pdf/1907.05159v1.pdf
PWC https://paperswithcode.com/paper/fairness-without-regret
Repo
Framework

PPO Dash: Improving Generalization in Deep Reinforcement Learning

Title PPO Dash: Improving Generalization in Deep Reinforcement Learning
Authors Joe Booth
Abstract Deep reinforcement learning is prone to overfitting, and traditional benchmarks such as Atari 2600 benchmark can exacerbate this problem. The Obstacle Tower Challenge addresses this by using randomized environments and separate seeds for training, validation, and test runs. This paper examines various improvements and best practices to the PPO algorithm using the Obstacle Tower Challenge to empirically study their impact with regards to generalization. Our experiments show that the combination provides state-of-the-art performance on the Obstacle Tower Challenge.
Tasks
Published 2019-07-15
URL https://arxiv.org/abs/1907.06704v3
PDF https://arxiv.org/pdf/1907.06704v3.pdf
PWC https://paperswithcode.com/paper/ppo-dash-improving-generalization-in-deep
Repo
Framework

Let’s Push Things Forward: A Survey on Robot Pushing

Title Let’s Push Things Forward: A Survey on Robot Pushing
Authors Jochen Stüber, Claudio Zito, Rustam Stolkin
Abstract As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot’s manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.
Tasks
Published 2019-05-13
URL https://arxiv.org/abs/1905.05138v1
PDF https://arxiv.org/pdf/1905.05138v1.pdf
PWC https://paperswithcode.com/paper/lets-push-things-forward-a-survey-on-robot
Repo
Framework

The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review

Title The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review
Authors Rory Bunker, Teo Susnjak
Abstract Over the past two decades, Machine Learning (ML) techniques have been increasingly utilized for the purpose of predicting outcomes in sport. In this paper, we provide a review of studies that have used ML for predicting results in team sport, covering studies from 1996 to 2019. We sought to answer five key research questions while extensively surveying papers in this field. This paper offers insights into which ML algorithms have tended to be used in this field, as well as those that are beginning to emerge with successful outcomes. Our research highlights defining characteristics of successful studies and identifies robust strategies for evaluating accuracy results in this application domain. Our study considers accuracies that have been achieved across different sports and explores the notion that outcomes of some team sports could be inherently more difficult to predict than others. Finally, our study uncovers common themes of future research directions across all surveyed papers, looking for gaps and opportunities, while proposing recommendations for future researchers in this domain.
Tasks
Published 2019-12-26
URL https://arxiv.org/abs/1912.11762v1
PDF https://arxiv.org/pdf/1912.11762v1.pdf
PWC https://paperswithcode.com/paper/the-application-of-machine-learning
Repo
Framework

Provably scale-covariant networks from oriented quasi quadrature measures in cascade

Title Provably scale-covariant networks from oriented quasi quadrature measures in cascade
Authors Tony Lindeberg
Abstract This article presents a continuous model for hierarchical networks based on a combination of mathematically derived models of receptive fields and biologically inspired computations. Based on a functional model of complex cells in terms of an oriented quasi quadrature combination of first- and second-order directional Gaussian derivatives, we couple such primitive computations in cascade over combinatorial expansions over image orientations. Scale-space properties of the computational primitives are analysed and it is shown that the resulting representation allows for provable scale and rotation covariance. A prototype application to texture analysis is developed and it is demonstrated that a simplified mean-reduced representation of the resulting QuasiQuadNet leads to promising experimental results on three texture datasets.
Tasks Texture Classification
Published 2019-03-01
URL https://arxiv.org/abs/1903.00289v2
PDF https://arxiv.org/pdf/1903.00289v2.pdf
PWC https://paperswithcode.com/paper/provably-scale-covariant-networks-from
Repo
Framework

Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation

Title Contrastively Smoothed Class Alignment for Unsupervised Domain Adaptation
Authors Shuyang Dai, Yu Cheng, Yizhe Zhang, Zhe Gan, Jingjing Liu, Lawrence Carin
Abstract Recent unsupervised approaches to domain adaptation primarily focus on minimizing the gap between the source and the target domains through refining the feature generator, in order to learn a better alignment between the two domains. This minimization can be achieved via a domain classifier to detect target-domain features that are divergent from source-domain features. However, by optimizing via such domain classification discrepancy, ambiguous target samples that are not smoothly distributed on the low-dimensional data manifold are often missed. To solve this issue, we propose a novel Contrastively Smoothed Class Alignment (CoSCA) model, that explicitly incorporates both intra- and inter-class domain discrepancy to better align ambiguous target samples with the source domain. CoSCA estimates the underlying label hypothesis of target samples, and simultaneously adapts their feature representations by optimizing a proposed contrastive loss. In addition, Maximum Mean Discrepancy (MMD) is utilized to directly match features between source and target samples for better global alignment. Experiments on several benchmark datasets demonstrate that CoSCA can outperform state-of-the-art approaches for unsupervised domain adaptation by producing more discriminative features.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2019-09-11
URL https://arxiv.org/abs/1909.05288v3
PDF https://arxiv.org/pdf/1909.05288v3.pdf
PWC https://paperswithcode.com/paper/contrastively-smoothed-class-alignment-for
Repo
Framework

Automatic Long-Term Deception Detection in Group Interaction Videos

Title Automatic Long-Term Deception Detection in Group Interaction Videos
Authors Chongyang Bai, Maksim Bolonkin, Judee Burgoon, Chao Chen, Norah Dunbar, Bharat Singh, V. S. Subrahmanian, Zhe Wu
Abstract Most work on automated deception detection (ADD) in video has two restrictions: (i) it focuses on a video of one person, and (ii) it focuses on a single act of deception in a one or two minute video. In this paper, we propose a new ADD framework which captures long term deception in a group setting. We study deception in the well-known Resistance game (like Mafia and Werewolf) which consists of 5-8 players of whom 2-3 are spies. Spies are deceptive throughout the game (typically 30-65 minutes) to keep their identity hidden. We develop an ensemble predictive model to identify spies in Resistance videos. We show that features from low-level and high-level video analysis are insufficient, but when combined with a new class of features that we call LiarRank, produce the best results. We achieve AUCs of over 0.70 in a fully automated setting. Our demo can be found at http://home.cs.dartmouth.edu/~mbolonkin/scan/demo/
Tasks Deception Detection
Published 2019-05-15
URL https://arxiv.org/abs/1905.08617v2
PDF https://arxiv.org/pdf/1905.08617v2.pdf
PWC https://paperswithcode.com/paper/190508617
Repo
Framework
comments powered by Disqus