April 2, 2020

3521 words 17 mins read

Paper Group ANR 350

Rethinking Object Detection in Retail Stores. Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference. Investigating Language Impact in Bilingual Approaches for Computational Language Documentation. Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions. Quantifying H …

Rethinking Object Detection in Retail Stores


Title	Rethinking Object Detection in Retail Stores
Authors	Yuanqiang Cai, Longyin Wen, Libo Zhang, Dawei Du, Weiqiang Wang, Pengfei Zhu
Abstract	The convention standard for object detection uses a bounding box to represent each individual object instance. However, it is not practical in the industry-relevant applications in the context of warehouses due to severe occlusions among groups of instances of the same categories. In this paper, we propose a new task, ie, simultaneously object localization and counting, abbreviated as Locount, which requires algorithms to localize groups of objects of interest with the number of instances. However, there does not exist a dataset or benchmark designed for such a task. To this end, we collect a large-scale object localization and counting dataset with rich annotations in retail stores, which consists of 50,394 images with more than 1.9 million object instances in 140 categories. Together with this dataset, we provide a new evaluation protocol and divide the training and testing subsets to fairly evaluate the performance of algorithms for Locount, developing a new benchmark for the Locount task. Moreover, we present a cascaded localization and counting network as a strong baseline, which gradually classifies and regresses the bounding boxes of objects with the predicted numbers of instances enclosed in the bounding boxes, trained in an end-to-end manner. Extensive experiments are conducted on the proposed dataset to demonstrate its significance and the analysis discussions on failure cases are provided to indicate future directions. Dataset is available at https://isrc.iscas.ac.cn/gitlab/research/locount-dataset.
Tasks	Object Detection, Object Localization
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08230v1
PDF	https://arxiv.org/pdf/2003.08230v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-object-detection-in-retail-stores
Repo
Framework

Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference


Title	Who Make Drivers Stop? Towards Driver-centric Risk Assessment: Risk Object Identification via Causal Inference
Authors	Chengxi Li, Stanley H. Chan, Yi-Ting Chen
Abstract	We propose a framework based on causal inference for risk object identification, an essential task towards driver-centric risk assessment. In this work, risk objects are defined as objects influencing driver’s goal-oriented behavior. There are two limitations of the existing approaches. First, they require strong supervisions such as risk object location or human gaze location. Second, there is no explicit reasoning stage for identifying risk object. To address these issues, the task of identifying causes of driver behavioral change is formalized in the language of functional causal models and interventions. Specifically, we iteratively simulate causal effect by removing an object using the proposed driving model. The risk object is determined as the one causing the most substantial causal effect. We evaluate the proposed framework on the Honda Research Institute Driving Dataset (HDD). The dataset provides the annotation for risk object localization to enable systematic benchmarking with existing approaches. Our framework demonstrates a substantial average performance boost over a strong baseline by 7.5%.
Tasks	Causal Inference, Object Localization
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02425v1
PDF	https://arxiv.org/pdf/2003.02425v1.pdf
PWC	https://paperswithcode.com/paper/who-make-drivers-stop-towards-driver-centric
Repo
Framework

Investigating Language Impact in Bilingual Approaches for Computational Language Documentation


Title	Investigating Language Impact in Bilingual Approaches for Computational Language Documentation
Authors	Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier
Abstract	For endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work and potential automatic approaches which will work on top of the produced bilingual corpus. For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results highlight that the choice of language for translation influences the word segmentation performance, and that different lexicons are learned by using different aligned translations. Lastly, this paper proposes a hybrid approach for bilingual word segmentation, combining boundary clues extracted from a non-parametric Bayesian model (Goldwater et al., 2009a) with the attentional word segmentation neural model from Godard et al. (2018). Our results suggest that incorporating these clues into the neural models’ input representation increases their translation and alignment quality, specially for challenging language pairs.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13325v1
PDF	https://arxiv.org/pdf/2003.13325v1.pdf
PWC	https://paperswithcode.com/paper/investigating-language-impact-in-bilingual
Repo
Framework

Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions


Title	Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions
Authors	Kumar Abhishek, Shweta Jain, Sujit Gujar
Abstract	For sponsored search auctions, we consider contextual multi-armed bandit problem in the presence of strategic agents. In this setting, at each round, an advertising platform (center) runs an auction to select the best-suited ads relevant to the query posted by the user. It is in the best interest of the center to select an ad that has a high expected value (i.e., probability of getting a click $\times$ value it derives from a click of the ad). The probability of getting a click (CTR) is unknown to the center and depends on the user’s profile (context) posting the query. Further, the value derived for a click is the private information to the advertiser and thus needs to be elicited truthfully. The existing solution in this setting is not practical as it suffers from very high regret ($O(T^{\frac{2}{3}})$).
Tasks	Multi-Armed Bandits
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11349v1
PDF	https://arxiv.org/pdf/2002.11349v1.pdf
PWC	https://paperswithcode.com/paper/designing-truthful-contextual-multi-armed
Repo
Framework

Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections


Title	Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections
Authors	Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan
Abstract	Human input has enabled autonomous systems to improve their capabilities and achieve complex behaviors that are otherwise challenging to generate automatically. Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives. These techniques assume that the human’s desired objective already exists within the robot’s hypothesis space. In reality, this assumption is often inaccurate: there will always be situations where the person might care about aspects of the task that the robot does not know about. Without this knowledge, the robot cannot infer the correct objective. Hence, when the robot’s hypothesis space is misspecified, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. In this paper, we posit that the robot should reason explicitly about how well it can explain human inputs given its hypothesis space and use that situational confidence to inform how it should incorporate human input. We demonstrate our method on a 7 degree-of-freedom robot manipulator in learning from two important types of human input: demonstrations of manipulation tasks, and physical corrections during the robot’s task execution.
Tasks
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00941v2
PDF	https://arxiv.org/pdf/2002.00941v2.pdf
PWC	https://paperswithcode.com/paper/quantifying-hypothesis-space-misspecification
Repo
Framework

Optimising Game Tactics for Football


Title	Optimising Game Tactics for Football
Authors	Ryan Beal, Georgios Chalkiadakis, Timothy J. Norman, Sarvapali D. Ramchurn
Abstract	In this paper we present a novel approach to optimise tactical and strategic decision making in football (soccer). We model the game of football as a multi-stage game which is made up from a Bayesian game to model the pre-match decisions and a stochastic game to model the in-match state transitions and decisions. Using this formulation, we propose a method to predict the probability of game outcomes and the payoffs of team actions. Building upon this, we develop algorithms to optimise team formation and in-game tactics with different objectives. Empirical evaluation of our approach on real-world datasets from 760 matches shows that by using optimised tactics from our Bayesian and stochastic games, we can increase a team chances of winning by up to 16.1% and 3.4% respectively.
Tasks	Decision Making, Game of Football
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10294v1
PDF	https://arxiv.org/pdf/2003.10294v1.pdf
PWC	https://paperswithcode.com/paper/optimising-game-tactics-for-football
Repo
Framework

Few-shot Learning with Weakly-supervised Object Localization


Title	Few-shot Learning with Weakly-supervised Object Localization
Authors	Jinfu Lin, Xiaojian He
Abstract	Few-shot learning (FSL) aims to learn novel visual categories from very few samples, which is a challenging problem in real-world applications. Many data generation methods have improved the performance of FSL models, but require lots of annotated images to train a specialized network (e.g., GAN) dedicated to hallucinate new samples. We argue that localization is a more efficient approach because it provides the most discriminative regions without using extra samples. In this paper, we propose a novel method to address the FSL task by achieving weakly-supervised object localization within performing few-shot classification. To this end, we design (i) a triplet-input module to obtain the initial object seeds and (ii) an Image-To-Class-Distance (ITCD) based localizer to activate the deep descriptors of the key objects, thus obtaining the more discriminative representations used to perform few-shot classification. Extensive experiments show our method outperforms the state-of-the-art methods on benchmark datasets under various settings. Besides, our method achieves superior performance over previous methods when training the model on miniImageNet and evaluating it on the different datasets (e.g., Stanford Dogs), demonstrating its superior generalization capacity. Extra visualization shows the proposed method can localize the key objects accurately.
Tasks	Few-Shot Learning, Object Localization, Weakly-Supervised Object Localization
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00874v1
PDF	https://arxiv.org/pdf/2003.00874v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-learning-with-weakly-supervised
Repo
Framework

A learning without forgetting approach to incorporate artifact knowledge in polyp localization tasks


Title	A learning without forgetting approach to incorporate artifact knowledge in polyp localization tasks
Authors	Roger D. Soberanis-Mukul, Maxime Kayser, Anna-Maria Zvereva, Peter Klare, Nassir Navab, Shadi Albarqouni
Abstract	Colorectal polyps are abnormalities in the colon tissue that can develop into colorectal cancer. The survival rate for patients is higher when the disease is detected at an early stage and polyps can be removed before they develop into malignant tumors. Deep learning methods have become the state of art in automatic polyp detection. However, the performance of current models heavily relies on the size and quality of the training datasets. Endoscopic video sequences tend to be corrupted by different artifacts affecting visibility and hence, the detection rates. In this work, we analyze the effects that artifacts have in the polyp localization problem. For this, we evaluate the RetinaNet architecture, originally defined for object localization. We also define a model inspired by the learning without forgetting framework, which allows us to employ artifact detection knowledge in the polyp localization problem. Finally, we perform several experiments to analyze the influence of the artifacts in the performance of these models. To our best knowledge, this is the first extensive analysis of the influence of artifact in polyp localization and the first work incorporating learning without forgetting ideas for simultaneous artifact and polyp localization tasks.
Tasks	Object Localization
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02883v2
PDF	https://arxiv.org/pdf/2002.02883v2.pdf
PWC	https://paperswithcode.com/paper/a-learning-without-forgetting-approach-to
Repo
Framework


Title	Asymmetric Correlation Quantization Hashing for Cross-modal Retrieval
Authors	Lu Wang, Jie Yang
Abstract	Due to the superiority in similarity computation and database storage for large-scale multiple modalities data, cross-modal hashing methods have attracted extensive attention in similarity retrieval across the heterogeneous modalities. However, there are still some limitations to be further taken into account: (1) most current CMH methods transform real-valued data points into discrete compact binary codes under the binary constraints, limiting the capability of representation for original data on account of abundant loss of information and producing suboptimal hash codes; (2) the discrete binary constraint learning model is hard to solve, where the retrieval performance may greatly reduce by relaxing the binary constraints for large quantization error; (3) handling the learning problem of CMH in a symmetric framework, leading to difficult and complex optimization objective. To address above challenges, in this paper, a novel Asymmetric Correlation Quantization Hashing (ACQH) method is proposed. Specifically, ACQH learns the projection matrixs of heterogeneous modalities data points for transforming query into a low-dimensional real-valued vector in latent semantic space and constructs the stacked compositional quantization embedding in a coarse-to-fine manner for indicating database points by a series of learnt real-valued codeword in the codebook with the help of pointwise label information regression simultaneously. Besides, the unified hash codes across modalities can be directly obtained by the discrete iterative optimization framework devised in the paper. Comprehensive experiments on diverse three benchmark datasets have shown the effectiveness and rationality of ACQH.
Tasks	Cross-Modal Retrieval, Quantization
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04625v1
PDF	https://arxiv.org/pdf/2001.04625v1.pdf
PWC	https://paperswithcode.com/paper/asymmetric-correlation-quantization-hashing
Repo
Framework

DRST: Deep Residual Shearlet Transform for Densely Sampled Light Field Reconstruction


Title	DRST: Deep Residual Shearlet Transform for Densely Sampled Light Field Reconstruction
Authors	Yuan Gao, Robert Bregovic, Reinhard Koch, Atanas Gotchev
Abstract	The Image-Based Rendering (IBR) approach using Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction. The ST-based DSLF reconstruction typically relies on an iterative thresholding algorithm for Epipolar-Plane Image (EPI) sparse regularization in shearlet domain, involving dozens of transformations between image domain and shearlet domain, which are in general time-consuming. To overcome this limitation, a novel learning-based ST approach, referred to as Deep Residual Shearlet Transform (DRST), is proposed in this paper. Specifically, for an input sparsely-sampled EPI, DRST employs a deep fully Convolutional Neural Network (CNN) to predict the residuals of the shearlet coefficients in shearlet domain in order to reconstruct a densely-sampled EPI in image domain. The DRST network is trained on synthetic Sparsely-Sampled Light Field (SSLF) data only by leveraging elaborately-designed masks. Experimental results on three challenging real-world light field evaluation datasets with varying moderate disparity ranges (8 - 16 pixels) demonstrate the superiority of the proposed learning-based DRST approach over the non-learning-based ST method for DSLF reconstruction. Moreover, DRST provides a 2.4x speedup over ST, at least.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08865v1
PDF	https://arxiv.org/pdf/2003.08865v1.pdf
PWC	https://paperswithcode.com/paper/drst-deep-residual-shearlet-transform-for
Repo
Framework

Accelerating Deep Reinforcement Learning With the Aid of a Partial Model: Power-Efficient Predictive Video Streaming


Title	Accelerating Deep Reinforcement Learning With the Aid of a Partial Model: Power-Efficient Predictive Video Streaming
Authors	Dong Liu, Jianyu Zhao, Chenyang Yang, Lajos Hanzo
Abstract	Predictive power allocation is conceived for power-efficient video streaming over mobile networks using deep reinforcement learning. The goal is to minimize the accumulated energy consumption over a complete video streaming session for a mobile user under the quality of service constraint that avoids video playback interruptions. To handle the continuous state and action spaces, we resort to deep deterministic policy gradient (DDPG) algorithm for solving the formulated problem. In contrast to previous predictive resource policies that first predict future information with historical data and then optimize the policy based on the predicted information, the proposed policy operates in an on-line and end-to-end manner. By judiciously designing the action and state that only depend on slowly-varying average channel gains, the signaling overhead between the edge server and the base stations can be reduced, and the dynamics of the system can be learned effortlessly. To improve the robustness of streaming and accelerate learning, we further exploit the partially known dynamics of the system by integrating the concepts of safer layer, post-decision state, and virtual experience into the basic DDPG algorithm. Our simulation results show that the proposed policies converge to the optimal policy derived based on perfect prediction of the future large-scale channel gains and outperforms the first-predict-then-optimize policy in the presence of prediction errors. By harnessing the partially known model of the system dynamics, the convergence speed can be dramatically improved.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09708v1
PDF	https://arxiv.org/pdf/2003.09708v1.pdf
PWC	https://paperswithcode.com/paper/accelerating-deep-reinforcement-learning-with
Repo
Framework

Computer Aided Detection for Pulmonary Embolism Challenge (CAD-PE)


Title	Computer Aided Detection for Pulmonary Embolism Challenge (CAD-PE)
Authors	Germán González, Daniel Jimenez-Carretero, Sara Rodríguez-López, Carlos Cano-Espinosa, Miguel Cazorla, Tanya Agarwal, Vinit Agarwal, Nima Tajbakhsh, Michael B. Gotway, Jianming Liang, Mojtaba Masoudi, Noushin Eftekhari, Mahdi Saadatmand, Hamid-Reza Pourreza, Patricia Fraga-Rivas, Eduardo Fraile, Frank J. Rybicki, Ara Kassarjian, Raúl San José Estépar, Maria J. Ledesma-Carbayo
Abstract	Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists’ sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography pulmonary angiographies, use it to compare the sensitivity and false positive rate of current algorithms and to develop new methods that improve such metrics. Methods: 91 Computed tomography pulmonary angiography scans were annotated by at least one radiologist by segmenting all pulmonary emboli visible on the study. 20 annotated CTPAs were open to the public in the form of a medical image analysis challenge. 20 more were kept for evaluation purposes. 51 were made available post-challenge. 8 submissions, 6 of them novel, were evaluated on the 20 evaluation CTPAs. Performance was measured as per embolus sensitivity vs. false positives per scan curve. Results: The best algorithms achieved a per-embolus sensitivity of 75% at 2 false positives per scan (fps) or of 70% at 1 fps, outperforming the state of the art. Deep learning approaches outperformed traditional machine learning ones, and their performance improved with the number of training cases. Significance: Through this work and challenge we have improved the state-of-the art of computer aided detection algorithms for pulmonary embolism. An open database and an evaluation benchmark for such algorithms have been generated, easing the development of further improvements. Implications on clinical practice will need further research.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13440v1
PDF	https://arxiv.org/pdf/2003.13440v1.pdf
PWC	https://paperswithcode.com/paper/computer-aided-detection-for-pulmonary
Repo
Framework

End-to-End Evaluation of Federated Learning and Split Learning for Internet of Things


Title	End-to-End Evaluation of Federated Learning and Split Learning for Internet of Things
Authors	Yansong Gao, Minki Kim, Sharif Abuadbba, Yeonjae Kim, Chandra Thapa, Kyuyeon Kim, Seyit A. Camtepe, Hyoungshick Kim, Surya Nepal
Abstract	This work is the first attempt to evaluate and compare felderated learning (FL) and split neural networks (SplitNN) in real-world IoT settings in terms of learning performance and device implementation overhead. We consider a variety of datasets, different model architectures, multiple clients, and various performance metrics. For learning performance, which is specified by the model accuracy and convergence speed metrics, we empirically evaluate both FL and SplitNN under different types of data distributions such as imbalanced and non-independent and identically distributed (non-IID) data. We show that the learning performance of SplitNN is better than FL under an imbalanced data distribution, but worse than FL under an extreme non-IID data distribution. For implementation overhead, we end-to-end mount both FL and SplitNN on Raspberry Pis, and comprehensively evaluate overheads including training time, communication overhead under the real LAN setting, power consumption and memory usage. Our key observations are that under IoT scenario where the communication traffic is the main concern, the FL appears to perform better over SplitNN because FL has the significantly lower communication overhead compared with SplitNN, which empirically corroborate previous statistical analysis. In addition, we reveal several unrecognized limitations about SplitNN, forming the basis for future research.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13376v1
PDF	https://arxiv.org/pdf/2003.13376v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-evaluation-of-federated-learning
Repo
Framework

3D Shape Segmentation with Geometric Deep Learning


Title	3D Shape Segmentation with Geometric Deep Learning
Authors	Davide Boscaini, Fabio Poiesi
Abstract	The semantic segmentation of 3D shapes with a high-density of vertices could be impractical due to large memory requirements. To make this problem computationally tractable, we propose a neural-network based approach that produces 3D augmented views of the 3D shape to solve the whole segmentation as sub-segmentation problems. 3D augmented views are obtained by projecting vertices and normals of a 3D shape onto 2D regular grids taken from different viewpoints around the shape. These 3D views are then processed by a Convolutional Neural Network to produce a probability distribution function (pdf) over the set of the semantic classes for each vertex. These pdfs are then re-projected on the original 3D shape and postprocessed using contextual information through Conditional Random Fields. We validate our approach using 3D shapes of publicly available datasets and of real objects that are reconstructed using photogrammetry techniques. We compare our approach against state-of-the-art alternatives.
Tasks	Semantic Segmentation
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00397v1
PDF	https://arxiv.org/pdf/2002.00397v1.pdf
PWC	https://paperswithcode.com/paper/3d-shape-segmentation-with-geometric-deep
Repo
Framework

Foreground object segmentation in RGB-D data implemented on GPU


Title	Foreground object segmentation in RGB-D data implemented on GPU
Authors	Piotr Janus, Tomasz Kryjak, Marek Gorgon
Abstract	This paper presents a GPU implementation of two foreground object segmentation algorithms: Gaussian Mixture Model (GMM) and Pixel Based Adaptive Segmenter (PBAS) modified for RGB-D data support. The simultaneous use of colour (RGB) and depth (D) data allows to improve segmentation accuracy, especially in case of colour camouflage, illumination changes and occurrence of shadows. Three GPUs were used to accelerate calculations: embedded NVIDIA Jetson TX2 (Maxwell architecture), mobile NVIDIA GeForce GTX 1050m (Pascal architecture) and efficient NVIDIA RTX 2070 (Turing architecture). Segmentation accuracy comparable to previously published works was obtained. Moreover, the use of a GPU platform allowed to get real-time image processing. In addition, the system has been adapted to work with two RGB-D sensors: RealSense D415 and D435 from Intel.
Tasks	Semantic Segmentation
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00250v1
PDF	https://arxiv.org/pdf/2002.00250v1.pdf
PWC	https://paperswithcode.com/paper/foreground-object-segmentation-in-rgb-d-data
Repo
Framework