January 27, 2020

3221 words 16 mins read

Paper Group ANR 1286

Paper Group ANR 1286

Normal Assisted Stereo Depth Estimation. Detecting Reflections by Combining Semantic and Instance Segmentation. An End-to-End Network for Panoptic Segmentation. Using DP Towards A Shortest Path Problem-Related Application. A Micro-Objective Perspective of Reinforcement Learning. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Pa …

Normal Assisted Stereo Depth Estimation

Title Normal Assisted Stereo Depth Estimation
Authors Uday Kusupati, Shuo Cheng, Rui Chen, Hao Su
Abstract Accurate stereo depth estimation plays a critical role in various 3D tasks in both indoor and outdoor environments. Recently, learning-based multi-view stereo methods have demonstrated competitive performance with limited number of views. However, in challenging scenarios, especially when building cross-view correspondences is hard, these methods still cannot produce satisfying results. In this paper, we study how to enforce the consistency between surface normal and depth at training time to improve the performance. We couple the learning of a multi-view normal estimation module and a multi-view depth estimation module. In addition, we propose a novel consistency loss to train an independent consistency module that refines the depths from depth/normal pairs. We find that the joint learning can improve both the prediction of normal and depth, and the accuracy and smoothness can be further improved by enforcing the consistency. Experiments on MVS, SUN3D, RGBD and Scenes11 demonstrate the effectiveness of our method and state-of-the-art performance.
Tasks Depth Estimation, Stereo Depth Estimation
Published 2019-11-24
URL https://arxiv.org/abs/1911.10444v2
PDF https://arxiv.org/pdf/1911.10444v2.pdf
PWC https://paperswithcode.com/paper/normal-assisted-stereo-depth-estimation
Repo
Framework

Detecting Reflections by Combining Semantic and Instance Segmentation

Title Detecting Reflections by Combining Semantic and Instance Segmentation
Authors David Owen, Ping-Lin Chang
Abstract Reflections in natural images commonly cause false positives in automated detection systems. These false positives can lead to significant impairment of accuracy in the tasks of detection, counting and segmentation. Here, inspired by the recent panoptic approach to segmentation, we show how fusing instance and semantic segmentation can automatically identify reflection false positives, without explicitly needing to have the reflective regions labelled. We explore in detail how state of the art two-stage detectors suffer a loss of broader contextual features, and hence are unable to learn to ignore these reflections. We then present an approach to fuse instance and semantic segmentations for this application, and subsequently show how this reduces false positive detections in a real world surveillance data with a large number of reflective surfaces. This demonstrates how panoptic segmentation and related work, despite being in its infancy, can already be useful in real world computer vision problems.
Tasks Instance Segmentation, Panoptic Segmentation, Semantic Segmentation
Published 2019-04-30
URL http://arxiv.org/abs/1904.13273v1
PDF http://arxiv.org/pdf/1904.13273v1.pdf
PWC https://paperswithcode.com/paper/detecting-reflections-by-combining-semantic
Repo
Framework

An End-to-End Network for Panoptic Segmentation

Title An End-to-End Network for Panoptic Segmentation
Authors Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang
Abstract Panoptic segmentation, which needs to assign a category label to each pixel and segment each object instance simultaneously, is a challenging topic. Traditionally, the existing approaches utilize two independent models without sharing features, which makes the pipeline inefficient to implement. In addition, a heuristic method is usually employed to merge the results. However, the overlapping relationship between object instances is difficult to determine without sufficient context information during the merging process. To address the problems, we propose a novel end-to-end network for panoptic segmentation, which can efficiently and effectively predict both the instance and stuff segmentation in a single network. Moreover, we introduce a novel spatial ranking module to deal with the occlusion problem between the predicted instances. Extensive experiments have been done to validate the performance of our proposed method and promising results have been achieved on the COCO Panoptic benchmark.
Tasks Panoptic Segmentation
Published 2019-03-12
URL http://arxiv.org/abs/1903.05027v2
PDF http://arxiv.org/pdf/1903.05027v2.pdf
PWC https://paperswithcode.com/paper/an-end-to-end-network-for-panoptic
Repo
Framework
Title Using DP Towards A Shortest Path Problem-Related Application
Authors Jianhao Jiao, Rui Fan, Han Ma, Ming Liu
Abstract The detection of curved lanes is still challenging for autonomous driving systems. Although current cutting-edge approaches have performed well in real applications, most of them are based on strict model assumptions. Similar to other visual recognition tasks, lane detection can be formulated as a two-dimensional graph searching problem, which can be solved by finding several optimal paths along with line segments and boundaries. In this paper, we present a directed graph model, in which dynamic programming is used to deal with a specific shortest path problem. This model is particularly suitable to represent objects with long continuous shape structure, e.g., lanes and roads. We apply the designed model and proposed an algorithm for detecting lanes by formulating it as the shortest path problem. To evaluate the performance of our proposed algorithm, we tested five sequences (including 1573 frames) from the KITTI database. The results showed that our method achieves an average successful detection precision of 97.5%.
Tasks Autonomous Driving, Lane Detection
Published 2019-03-07
URL http://arxiv.org/abs/1903.02765v1
PDF http://arxiv.org/pdf/1903.02765v1.pdf
PWC https://paperswithcode.com/paper/using-dp-towards-a-shortest-path-problem
Repo
Framework

A Micro-Objective Perspective of Reinforcement Learning

Title A Micro-Objective Perspective of Reinforcement Learning
Authors Changjian Li, Krzysztof Czarnecki
Abstract The standard reinforcement learning (RL) formulation considers the expectation of the (discounted) cumulative reward. This is limiting in applications where we are concerned with not only the expected performance, but also the distribution of the performance. In this paper, we introduce micro-objective reinforcement learning — an alternative RL formalism that overcomes this issue. In this new formulation, a RL task is specified by a set of micro-objectives, which are constructs that specify the desirability or undesirability of events. In addition, micro-objectives allow prior knowledge in the form of temporal abstraction to be incorporated into the global RL objective. The generality of this formalism, and its relations to single/multi-objective RL, and hierarchical RL are discussed.
Tasks
Published 2019-05-24
URL https://arxiv.org/abs/1905.10016v2
PDF https://arxiv.org/pdf/1905.10016v2.pdf
PWC https://paperswithcode.com/paper/rethinking-expected-cumulative-reward
Repo
Framework

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

Title Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
Authors Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen
Abstract In this work, we introduce Panoptic-DeepLab, a simple, strong, and fast system for panoptic segmentation, aiming to establish a solid baseline for bottom-up methods that can achieve comparable performance of two-stage methods while yielding fast inference speed. In particular, Panoptic-DeepLab adopts the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model (e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. As a result, our single Panoptic-DeepLab simultaneously ranks first at all three Cityscapes benchmarks, setting the new state-of-art of 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set. Additionally, equipped with MobileNetV3, Panoptic-DeepLab runs nearly in real-time with a single 1025x2049 image (15.8 frames per second), while achieving a competitive performance on Cityscapes (54.1 PQ% on test set). On Mapillary Vistas test set, our ensemble of six models attains 42.7% PQ, outperforming the challenge winner in 2018 by a healthy margin of 1.5%. Finally, our Panoptic-DeepLab also performs on par with several top-down approaches on the challenging COCO dataset. For the first time, we demonstrate a bottom-up approach could deliver state-of-the-art results on panoptic segmentation.
Tasks Instance Segmentation, Panoptic Segmentation, Semantic Segmentation
Published 2019-11-22
URL https://arxiv.org/abs/1911.10194v3
PDF https://arxiv.org/pdf/1911.10194v3.pdf
PWC https://paperswithcode.com/paper/panoptic-deeplab-a-simple-strong-and-fast
Repo
Framework

Defective Convolutional Layers Learn Robust CNNs

Title Defective Convolutional Layers Learn Robust CNNs
Authors Tiange Luo, Tianle Cai, Mengxiao Zhang, Siyu Chen, Di He, Liwei Wang
Abstract Robustness of convolutional neural networks has recently been highlighted by the adversarial examples, i.e., inputs added with well-designed perturbations which are imperceptible to humans but can cause the network to give incorrect outputs. Recent research suggests that the noises in adversarial examples break the textural structure, which eventually leads to wrong predictions by convolutional neural networks. To help a convolutional neural network make predictions relying less on textural information, we propose defective convolutional layers which contain defective neurons whose activations are set to be a constant function. As the defective neurons contain no information and are far different from the standard neurons in its spatial neighborhood, the textural features cannot be accurately extracted and the model has to seek for other features for classification, such as the shape. We first show that predictions made by the defective CNN are less dependent on textural information, but more on shape information, and further find that adversarial examples generated by the defective CNN appear to have semantic shapes. Experimental results demonstrate the defective CNN has higher defense ability than the standard CNN against various types of attack. In particular, it achieves state-of-the-art performance against transfer-based attacks without applying any adversarial training.
Tasks
Published 2019-11-19
URL https://arxiv.org/abs/1911.08432v1
PDF https://arxiv.org/pdf/1911.08432v1.pdf
PWC https://paperswithcode.com/paper/defective-convolutional-layers-learn-robust-1
Repo
Framework

Efficient Parameter Sampling for Neural Network Construction

Title Efficient Parameter Sampling for Neural Network Construction
Authors Drimik Roy Chowdhury, Muhammad Firmansyah Kasim
Abstract The customizable nature of deep learning models have allowed them to be successful predictors in various disciplines. These models are often trained with respect to thousands or millions of instances for complicated problems, but the gathering of such an immense collection may be infeasible and expensive. However, what often occurs is the pollution of redundant information from these instances to the deep learning models. This paper outlines an algorithm that dynamically selects and appends instances to a training dataset from uncertain regions of the parameter space based on differences in predictions from multiple convolutional neural networks (CNNs). These CNNs are also simultaneously trained on this growing dataset to construct more accurate and knowledgable models. The methodology presented has reduced training dataset sizes by almost 90% and maintained predictive power in two diagnostics of high energy density physics.
Tasks
Published 2019-12-22
URL https://arxiv.org/abs/1912.10559v1
PDF https://arxiv.org/pdf/1912.10559v1.pdf
PWC https://paperswithcode.com/paper/efficient-parameter-sampling-for-neural
Repo
Framework

A Survey on Multi-output Learning

Title A Survey on Multi-output Learning
Authors Donna Xu, Yaxin Shi, Ivor W. Tsang, Yew-Soon Ong, Chen Gong, Xiaobo Shen
Abstract Multi-output learning aims to simultaneously predict multiple outputs given an input. It is an important learning problem due to the pressing need for sophisticated decision making in real-world applications. Inspired by big data, the 4Vs characteristics of multi-output imposes a set of challenges to multi-output learning, in terms of the volume, velocity, variety and veracity of the outputs. Increasing number of works in the literature have been devoted to the study of multi-output learning and the development of novel approaches for addressing the challenges encountered. However, it lacks a comprehensive overview on different types of challenges of multi-output learning brought by the characteristics of the multiple outputs and the techniques proposed to overcome the challenges. This paper thus attempts to fill in this gap to provide a comprehensive review on this area. We first introduce different stages of the life cycle of the output labels. Then we present the paradigm on multi-output learning, including its myriads of output structures, definitions of its different sub-problems, model evaluation metrics and popular data repositories used in the study. Subsequently, we review a number of state-of-the-art multi-output learning methods, which are categorized based on the challenges.
Tasks Decision Making
Published 2019-01-02
URL https://arxiv.org/abs/1901.00248v2
PDF https://arxiv.org/pdf/1901.00248v2.pdf
PWC https://paperswithcode.com/paper/a-survey-on-multi-output-learning
Repo
Framework

Towards Emotion Retrieval in Egocentric PhotoStream

Title Towards Emotion Retrieval in Egocentric PhotoStream
Authors Estefania Talavera, Petia Radeva, Nicolai Petkov
Abstract The availability and use of egocentric data are rapidly increasing due to the growing use of wearable cameras. Our aim is to study the effect (positive, neutral or negative) of egocentric images or events on an observer. Given egocentric photostreams capturing the wearer’s days, we propose a method that aims to assign sentiment to events extracted from egocentric photostreams. Such moments can be candidates to retrieve according to their possibility of representing a positive experience for the camera’s wearer. The proposed approach obtained a classification accuracy of 75% on the test set, with a deviation of 8%. Our model makes a step forward opening the door to sentiment recognition in egocentric photostreams.
Tasks
Published 2019-05-10
URL https://arxiv.org/abs/1905.04107v1
PDF https://arxiv.org/pdf/1905.04107v1.pdf
PWC https://paperswithcode.com/paper/towards-emotion-retrieval-in-egocentric
Repo
Framework

Semantic segmentation of trajectories with improved agent models for pedestrian behavior analysis

Title Semantic segmentation of trajectories with improved agent models for pedestrian behavior analysis
Authors Toru Tamaki, Daisuke Ogawa, Bisser Raytchev, Kazufumi Kaneda
Abstract In this paper, we propose a method for semantic segmentation of pedestrian trajectories based on pedestrian behavior models, or agents. The agents model the dynamics of pedestrian movements in two-dimensional space using a linear dynamics model and common start and goal locations of trajectories. First, agent models are estimated from the trajectories obtained from image sequences. Our method is built on top of the Mixture model of Dynamic pedestrian Agents (MDA); however, the MDA’s trajectory modeling and estimation are improved. Then, the trajectories are divided into semantically meaningful segments. The subsegments of a trajectory are modeled by applying a hidden Markov model using the estimated agent models. Experimental results with a real trajectory dataset show the effectiveness of the proposed method as compared to the well-known classical Ramer-Douglas-Peucker algorithm and also to the original MDA model.
Tasks Semantic Segmentation
Published 2019-12-12
URL https://arxiv.org/abs/1912.05727v1
PDF https://arxiv.org/pdf/1912.05727v1.pdf
PWC https://paperswithcode.com/paper/semantic-segmentation-of-trajectories-with-1
Repo
Framework

Train Where the Data is: A Case for Bandwidth Efficient Coded Training

Title Train Where the Data is: A Case for Bandwidth Efficient Coded Training
Authors Zhifeng Lin, Krishna Giri Narra, Mingchao Yu, Salman Avestimehr, Murali Annavaram
Abstract Training a machine learning model is both compute and data-intensive. Most of the model training is performed on high performance compute nodes and the training data is stored near these nodes for faster training. But there is a growing interest in enabling training near the data. For instance, mobile devices are rich sources of training data. It may not be feasible to consolidate the data from mobile devices into a cloud service, due to bandwidth and data privacy reasons. Training at mobile devices is however fraught with challenges. First mobile devices may join or leave the distributed setting, either voluntarily or due to environmental uncertainties, such as lack of power. Tolerating uncertainties is critical to the success of distributed mobile training. One proactive approach to tolerate computational uncertainty is to store data in a coded format and perform training on coded data. Encoding data is a challenging task since erasure codes require multiple devices to exchange their data to create a coded data partition, which places a significant bandwidth constraint. Furthermore, coded computing traditionally relied on a central node to encode and distribute data to all the worker nodes, which is not practical in a distributed mobile setting. In this paper, we tackle the uncertainty in distributed mobile training using a bandwidth-efficient encoding strategy. We use a Random Linear Network coding (RLNC) which reduces the need to exchange data partitions across all participating mobile devices, while at the same time preserving the property of coded computing to tolerate uncertainties. We implement gradient descent for logistic regression and SVM to evaluate the effectiveness of our mobile training framework. We demonstrate a 50% reduction in total required communication bandwidth compared to MDS coded computation, one of the popular erasure codes.
Tasks
Published 2019-10-22
URL https://arxiv.org/abs/1910.10283v1
PDF https://arxiv.org/pdf/1910.10283v1.pdf
PWC https://paperswithcode.com/paper/train-where-the-data-is-a-case-for-bandwidth
Repo
Framework

Predicting gait events from tibial acceleration in rearfoot running: a structured machine learning approach

Title Predicting gait events from tibial acceleration in rearfoot running: a structured machine learning approach
Authors Pieter Robberechts, Rud Derie, Pieter Van den Berghe, Joeri Gerlo, Dirk De Clercq, Veerle Segers, Jesse Davis
Abstract Gait event detection of the initial contact and toe off is essential for running gait analysis. Heuristic-based methods exist to estimate these key gait events from tibial accelerometry. These heuristic-based methods are unfortunately tailored to very specific acceleration profiles, which may offer complications when dealing with larger data sets and inherent biological variability. Therefore, the purpose of this study was to compare a previously utilised heuristic method of gait event detection to an original proposed method using a structured machine learning approach. Force-based event detection acted as the criterion measure in order to assess the accuracy of the predicted gait events. 3D tibial acceleration and ground reaction force data from 93 rearfoot runners were captured. A heuristic method and two machine learning methods were employed to derive initial contact, toe off and stance time from tibial acceleration signals. Both machine learning methods significantly outperformed existing heuristic approaches. Furthermore, results indicate that a structured recurrent neural network machine learning model offers the most accurate and consistent estimation of the gait events and its derived stance time during level overground running. The machine learning methods seem less affected by intra- and inter-subject variation within the data, allowing for accurate and efficient automated data output with possibilities for real-time monitoring and biofeedback during prolonged measurements.
Tasks Motion Capture, Structured Prediction
Published 2019-10-29
URL https://arxiv.org/abs/1910.13372v2
PDF https://arxiv.org/pdf/1910.13372v2.pdf
PWC https://paperswithcode.com/paper/191013372
Repo
Framework

Promoting Diversity for End-to-End Conversation Response Generation

Title Promoting Diversity for End-to-End Conversation Response Generation
Authors Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Jia-Chen Gu, Xiaodan Zhu
Abstract We present our work on Track 2 in the Dialog System Technology Challenges 7 (DSTC7). The DSTC7-Track 2 aims to evaluate the response generation of fully data-driven conversation models in knowledge-grounded settings, which provides the contextual-relevant factual texts. The Sequenceto-Sequence models have been widely used for end-to-end generative conversation modelling and achieved impressive results. However, they tend to output dull and repeated responses in previous studies. Our work aims to promote the diversity for end-to-end conversation response generation, which follows a two-stage pipeline: 1) Generate multiple responses. At this stage, two different models are proposed, i.e., a variational generative (VariGen) model and a retrieval based (Retrieval) model. 2) Rank and return the most related response by training a topic coherence discrimination (TCD) model for the ranking process. According to the official evaluation results, our proposed Retrieval and VariGen systems ranked first and second respectively on objective diversity metrics, i.e., Entropy, among all participant systems. And the VariGen system ranked second on NIST and METEOR metrics.
Tasks
Published 2019-01-27
URL http://arxiv.org/abs/1901.09444v2
PDF http://arxiv.org/pdf/1901.09444v2.pdf
PWC https://paperswithcode.com/paper/promoting-diversity-for-end-to-end
Repo
Framework

An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition

Title An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition
Authors Byung Hyung Kim, Sungho Jo
Abstract This paper presents a computational framework for providing affective labels to real-life situations, called A-Situ. We first define an affective situation, as a specific arrangement of affective entities relevant to emotion elicitation in a situation. Then, the affective situation is represented as a set of labels in the valence-arousal emotion space. Based on physiological behaviors in response to a situation, the proposed framework quantifies the expected emotion evoked by the interaction with a stimulus event. The accumulated result in a spatiotemporal situation is represented as a polynomial curve called the affective curve, which bridges the semantic gap between cognitive and affective perception in real-world situations. We show the efficacy of the curve for reliable emotion labeling in real-world experiments, respectively concerning 1) a comparison between the results from our system and existing explicit assessments for measuring emotion, 2) physiological distinctiveness in emotional states, and 3) physiological characteristics correlated to continuous labels. The efficiency of affective curves to discriminate emotional states is evaluated through subject-dependent classification performance using bicoherence features to represent discrete affective states in the valence-arousal space. Furthermore, electroencephalography-based statistical analysis revealed the physiological correlates of the affective curves.
Tasks Emotion Recognition
Published 2019-11-04
URL https://arxiv.org/abs/1911.01158v2
PDF https://arxiv.org/pdf/1911.01158v2.pdf
PWC https://paperswithcode.com/paper/an-affective-situation-labeling-system-from
Repo
Framework
comments powered by Disqus