January 26, 2020

3283 words 16 mins read

Paper Group ANR 1565

Human-centric light sensing and estimation from RGBD images: The invisible light switch. Learning Spatially Structured Image Transformations Using Planar Neural Networks. SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation. Towards An Angry-Birds-like Game System for Promoting Mental Well-being of Players Using Art …

Human-centric light sensing and estimation from RGBD images: The invisible light switch


Title	Human-centric light sensing and estimation from RGBD images: The invisible light switch
Authors	Theodore Tsesmelis, Irtiza Hasan, Marco Cristani, Alessio Del Bue, Fabio Galasso
Abstract	Lighting design in indoor environments is of primary importance for at least two reasons: 1) people should perceive an adequate light; 2) an effective lighting design means consistent energy saving. We present the Invisible Light Switch (ILS) to address both aspects. ILS dynamically adjusts the room illumination level to save energy while maintaining constant the light level perception of the users. So the energy saving is invisible to them. Our proposed ILS leverages a radiosity model to estimate the light level which is perceived by a person within an indoor environment, taking into account the person position and her/his viewing frustum (head pose). ILS may therefore dim those luminaires, which are not seen by the user, resulting in an effective energy saving, especially in large open offices (where light may otherwise be ON everywhere for a single person). To quantify the system performance, we have collected a new dataset where people wear luxmeter devices while working in office rooms. The luxmeters measure the amount of light (in Lux) reaching the people gaze, which we consider a proxy to their illumination level perception. Our initial results are promising: in a room with 8 LED luminaires, the energy consumption in a day may be reduced from 18585 to 6206 watts with ILS (currently needing 1560 watts for operations). While doing so, the drop in perceived lighting decreases by just 200 lux, a value considered negligible when the original illumination level is above 1200 lux, as is normally the case in offices.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10772v1
PDF	http://arxiv.org/pdf/1901.10772v1.pdf
PWC	https://paperswithcode.com/paper/human-centric-light-sensing-and-estimation
Repo
Framework

Learning Spatially Structured Image Transformations Using Planar Neural Networks


Title	Learning Spatially Structured Image Transformations Using Planar Neural Networks
Authors	Joel Michelson, Joshua H. Palmer, Aneesha Dasari, Maithilee Kunda
Abstract	Learning image transformations is essential to the idea of mental simulation as a method of cognitive inference. We take a connectionist modeling approach, using planar neural networks to learn fundamental imagery transformations, like translation, rotation, and scaling, from perceptual experiences in the form of image sequences. We investigate how variations in network topology, training data, and image shape, among other factors, affect the efficiency and effectiveness of learning visual imagery transformations, including effectiveness of transfer to operating on new types of data.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01553v1
PDF	https://arxiv.org/pdf/1912.01553v1.pdf
PWC	https://paperswithcode.com/paper/learning-spatially-structured-image
Repo
Framework

SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation


Title	SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation
Authors	Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xiansheng Hua
Abstract	Deep hashing methods have been proved to be effective and efficient for large-scale Web media search. The success of these data-driven methods largely depends on collecting sufficient labeled data, which is usually a crucial limitation in practical cases. The current solutions to this issue utilize Generative Adversarial Network (GAN) to augment data in semi-supervised learning. However, existing GAN-based methods treat image generations and hashing learning as two isolated processes, leading to generation ineffectiveness. Besides, most works fail to exploit the semantic information in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework. The SSAH method consists of an adversarial network (A-Net) and a hashing network (H-Net). To improve the quality of generative images, first, the A-Net learns hard samples with multi-scale occlusions and multi-angle rotated deformations which compete against the learning of accurate hashing codes. Second, we design a novel self-paced hard generation policy to gradually increase the hashing difficulty of generated samples. To make use of the semantic information in unlabeled ones, we propose a semi-supervised consistent loss. The experimental results show that our method can significantly improve state-of-the-art models on both the widely-used hashing datasets and fine-grained datasets.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08688v1
PDF	https://arxiv.org/pdf/1911.08688v1.pdf
PWC	https://paperswithcode.com/paper/ssah-semi-supervised-adversarial-deep-hashing
Repo
Framework

Towards An Angry-Birds-like Game System for Promoting Mental Well-being of Players Using Art-Therapy-embedded PCG


Title	Towards An Angry-Birds-like Game System for Promoting Mental Well-being of Players Using Art-Therapy-embedded PCG
Authors	Zhou Fang, Pujana Paliyawan, Ruck Thawonmas, Tomohiro Harada
Abstract	This paper presents an integration of a game system and the art therapy concept for promoting the mental well-being of video game players. In the proposed game system, the player plays an Angry-Birds-like game in which levels in the game are generated based on images they draw. Upon finishing a game level, the player also receives positive feedback (praising words) toward their drawing and the generated level from an Art Therapy AI. The proposed system is composed of three major parts: (1) a drawing recognizer that identifies what object is drawn by the player (Sketcher), (2) a level generator that converts the drawing image into a pixel image, then a set of blocks representing a game level (PCG AI), and (3) the Art Therapy AI that encourages the player and improves their emotion. This paper describes an overview of the system and explains how its major components function.
Tasks
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02695v1
PDF	https://arxiv.org/pdf/1911.02695v1.pdf
PWC	https://paperswithcode.com/paper/towards-an-angry-birds-like-game-system-for
Repo
Framework

A Fully Natural Gradient Scheme for Improving Inference of the Heterogeneous Multi-Output Gaussian Process Model


Title	A Fully Natural Gradient Scheme for Improving Inference of the Heterogeneous Multi-Output Gaussian Process Model
Authors	Juan-José Giraldo, Mauricio A. Álvarez
Abstract	A recent novel extension of multi-output Gaussian processes handles heterogeneous outputs assuming that each output has its own likelihood function. It uses a vector-valued Gaussian process prior to jointly model all likelihoods’ parameters as latent functions drawn from a Gaussian process with a linear model of coregionalisation covariance. By means of an inducing points framework, the model is able to obtain tractable variational bounds amenable to stochastic variational inference. Nonetheless, the strong conditioning between the variational parameters and the hyper-parameters burdens the adaptive gradient optimisation methods used in the original approach. To overcome this issue we borrow ideas from variational optimisation introducing an exploratory distribution over the hyper-parameters, allowing inference together with the variational parameters through a fully natural gradient optimisation scheme. We show that our optimisation scheme can achieve better local optima solution with higher test performance rates than adaptive gradient methods or an hybrid strategy that partially use natural gradients in cooperation with the Adam method. We compare the performance of the different methods over toy and real databases.
Tasks	Gaussian Processes
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10225v2
PDF	https://arxiv.org/pdf/1911.10225v2.pdf
PWC	https://paperswithcode.com/paper/a-fully-natural-gradient-scheme-for-improving
Repo
Framework

On Batch Bayesian Optimization


Title	On Batch Bayesian Optimization
Authors	Sayak Ray Chowdhury, Aditya Gopalan
Abstract	We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01032v1
PDF	https://arxiv.org/pdf/1911.01032v1.pdf
PWC	https://paperswithcode.com/paper/on-batch-bayesian-optimization
Repo
Framework

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images


Title	Improving Sample Efficiency in Model-Free Reinforcement Learning from Images
Authors	Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus
Abstract	Training an agent to solve control tasks directly from high-dimensional images with model-free reinforcement learning (RL) has proven difficult. The agent needs to learn a latent representation together with a control policy to perform the task. Fitting a high-capacity encoder using a scarce reward signal is not only sample inefficient, but also prone to suboptimal convergence. Two ways to improve sample efficiency are to extract relevant features for the task and use off-policy algorithms. We dissect various approaches of learning good latent features, and conclude that the image reconstruction loss is the essential ingredient that enables efficient and stable representation learning in image-based RL. Following these findings, we devise an off-policy actor-critic algorithm with an auxiliary decoder that trains end-to-end and matches state-of-the-art performance across both model-free and model-based algorithms on many challenging control tasks. We release our code to encourage future research on image-based RL.
Tasks	Image Reconstruction, Representation Learning
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01741v2
PDF	https://arxiv.org/pdf/1910.01741v2.pdf
PWC	https://paperswithcode.com/paper/improving-sample-efficiency-in-model-free-1
Repo
Framework

Incorporating Human Domain Knowledge in 3D LiDAR-based Semantic Segmentation


Title	Incorporating Human Domain Knowledge in 3D LiDAR-based Semantic Segmentation
Authors	Jilin Mei, Huijing Zhao
Abstract	This work studies semantic segmentation using 3D LiDAR data. Popular deep learning methods applied for this task require a large number of manual annotations to train the parameters. We propose a new method that makes full use of the advantages of traditional methods and deep learning methods via incorporating human domain knowledge into the neural network model to reduce the demand for large numbers of manual annotations and improve the training efficiency. We first pretrain a model with autogenerated samples from a rule-based classifier so that human knowledge can be propagated into the network. Based on the pretrained model, only a small set of annotations is required for further fine-tuning. Quantitative experiments show that the pretrained model achieves better performance than random initialization in almost all cases; furthermore, our method can achieve similar performance with fewer manual annotations.
Tasks	Semantic Segmentation
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09533v1
PDF	https://arxiv.org/pdf/1905.09533v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-human-domain-knowledge-in-3d
Repo
Framework

A Dataset of General-Purpose Rebuttal


Title	A Dataset of General-Purpose Rebuttal
Authors	Matan Orbach, Yonatan Bilu, Ariel Gera, Yoav Kantor, Lena Dankin, Tamar Lavee, Lili Kotlerman, Shachar Mirkin, Michal Jacovi, Ranit Aharonov, Noam Slonim
Abstract	In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension over argumentative content: given a speech on some specified topic, and a list of relevant arguments, the goal is to determine which of the arguments appear in the speech. The general rebuttals we describe here (written in English) overcome the need for topic-specific arguments to be provided, by proving to be applicable for a large set of topics. This allows creating responses beyond the scope of topics for which specific arguments are available. All data collected during this work is freely available for research.
Tasks
Published	2019-09-01
URL	https://arxiv.org/abs/1909.00393v1
PDF	https://arxiv.org/pdf/1909.00393v1.pdf
PWC	https://paperswithcode.com/paper/a-dataset-of-general-purpose-rebuttal
Repo
Framework

RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices


Title	RefineDetLite: A Lightweight One-stage Object Detection Framework for CPU-only Devices
Authors	Chen Chen, Mengyuan Liu, Xiandong Meng, Wanpeng Xiao, Qi Ju
Abstract	Previous state-of-the-art real-time object detectors have been reported on GPUs which are extremely expensive for processing massive data and in resource-restricted scenarios. Therefore, high efficiency object detectors on CPU-only devices are urgently-needed in industry. The floating-point operations (FLOPs) of networks are not strictly proportional to the running speed on CPU devices, which inspires the design of an exactly “fast” and “accurate” object detector. After investigating the concern gaps between classification networks and detection backbones, and following the design principles of efficient networks, we propose a lightweight residual-like backbone with large receptive fields and wide dimensions for low-level features, which are crucial for detection tasks. Correspondingly, we also design a light-head detection part to match the backbone capability. Furthermore, by analyzing the drawbacks of current one-stage detector training strategies, we also propose three orthogonal training strategies—IOU-guided loss, classes-aware weighting method and balanced multi-task training approach. Without bells and whistles, our proposed RefineDetLite achieves 26.8 mAP on the MSCOCO benchmark at a speed of 130 ms/pic on a single-thread CPU. The detection accuracy can be further increased to 29.6 mAP by integrating all the proposed training strategies, without apparent speed drop.
Tasks	Head Detection, Object Detection
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08855v2
PDF	https://arxiv.org/pdf/1911.08855v2.pdf
PWC	https://paperswithcode.com/paper/refinedetlite-a-lightweight-one-stage-object
Repo
Framework

Assessment of Faster R-CNN in Man-Machine collaborative search


Title	Assessment of Faster R-CNN in Man-Machine collaborative search
Authors	Arturo Deza, Amit Surana, Miguel P. Eckstein
Abstract	With the advent of modern expert systems driven by deep learning that supplement human experts (e.g. radiologists, dermatologists, surveillance scanners), we analyze how and when do such expert systems enhance human performance in a fine-grained small target visual search task. We set up a 2 session factorial experimental design in which humans visually search for a target with and without a Deep Learning (DL) expert system. We evaluate human changes of target detection performance and eye-movements in the presence of the DL system. We find that performance improvements with the DL system (computed via a Faster R-CNN with a VGG16) interacts with observer’s perceptual abilities (e.g., sensitivity). The main results include: 1) The DL system reduces the False Alarm rate per Image on average across observer groups of both high/low sensitivity; 2) Only human observers with high sensitivity perform better than the DL system, while the low sensitivity group does not surpass individual DL system performance, even when aided with the DL system itself; 3) Increases in number of trials and decrease in viewing time were mainly driven by the DL system only for the low sensitivity group. 4) The DL system aids the human observer to fixate at a target by the 3rd fixation. These results provide insights of the benefits and limitations of deep learning systems that are collaborative or competitive with humans.
Tasks
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02805v1
PDF	http://arxiv.org/pdf/1904.02805v1.pdf
PWC	https://paperswithcode.com/paper/assessment-of-faster-r-cnn-in-man-machine
Repo
Framework

Calcaneus Radiograph Analysis System: Rotation-Invariant Landmark Detection, Calcaneal Angle Measurement and Fracture Identification


Title	Calcaneus Radiograph Analysis System: Rotation-Invariant Landmark Detection, Calcaneal Angle Measurement and Fracture Identification
Authors	Jia Guo, Wei Wang, Huanxin Yan, Junxian Chen, Hailin Xu, Huiqi Li
Abstract	Calcaneus is the largest tarsal bone to withstand the daily stresses of weight bearing. The calcaneal fracture is the most common type in the tarsal bone fractures. After a fracture is suspected, plain radiographs should be taken first. Bohler’s Angle (BA) and Critical Angle of Gissane (CAG), measured by four anatomic landmarks in lateral foot radiograph, can aid operative restoration of the fractured calcaneus and fracture diagnosis and assessment. The aim of this study is to develop a system to automatically locate four anatomic landmarks and measure BA and CAG for fracture assessment. To solved the problem of fickle rotation of calcaneus, we proposed a coarse-to-fine Rotation-Invariant Regression-Voting (RIRV) landmark detection method based on Supported Vector Regression (SVR) and Scale Invariant Feature Transform (SIFT) patch descriptor. By implementing a novel normalization approach to convert displacements into coordinates of oriented feature patches, our method is explicit rotation-invariance comparing with traditional regressive method. A multi-stream CNN structure with multi-region input is designed to screen calcaneus fracture. The input ROIs of multi-stream CNN are normalized by detected landmarks to uniform view, orientation and scale. The advantage of our approach is the usage of landmarks using prior knowledge to normalize the inputs of CNN so as to improve the efficiency of CNN. Experiments show that our CNN can accurately identify the fractures with sensitivity of 95.21% and specificity of 95.32%.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04536v3
PDF	https://arxiv.org/pdf/1912.04536v3.pdf
PWC	https://paperswithcode.com/paper/calcaneus-radiograph-analysis-system
Repo
Framework

Detecting floodwater on roadways from image data with handcrafted features and deep transfer learning


Title	Detecting floodwater on roadways from image data with handcrafted features and deep transfer learning
Authors	Cem Sazara, Mecit Cetin, Khan M. Iftekharuddin
Abstract	Detecting roadway segments inundated due to floodwater has important applications for vehicle routing and traffic management decisions. This paper proposes a set of algorithms to automatically detect floodwater that may be present in an image captured by mobile phones or other types of optical cameras. For this purpose, image classification and flood area segmentation methods are developed. For the classification task, we used Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG) and pre-trained deep neural network (VGG-16) as feature extractors and trained logistic regression, k-nearest neighbors, and decision tree classifiers on the extracted features. Pre-trained VGG-16 network with logistic regression classifier outperformed all other methods. For the flood area segmentation task, we investigated superpixel based methods and Fully Convolutional Neural Network (FCN). Similar to the classification task, we trained logistic regression and k-nearest neighbors classifiers on the superpixel areas and compared that with an end-to-end trained FCN. Conditional Random Fields (CRF) method was applied after both segmentation methods to post-process coarse segmentation results. FCN offered the highest scores in all metrics; it was followed by superpixel-based logistic regression and then superpixel-based KNN.
Tasks	Image Classification, Transfer Learning
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00125v1
PDF	https://arxiv.org/pdf/1909.00125v1.pdf
PWC	https://paperswithcode.com/paper/detecting-floodwater-on-roadways-from-image
Repo
Framework

Relational Learning for Joint Head and Human Detection


Title	Relational Learning for Joint Head and Human Detection
Authors	Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z. Li, Xudong Zou
Abstract	Head and human detection have been rapidly improved with the development of deep convolutional neural networks. However, these two tasks are often studied separately without considering their inherent correlation, leading to that 1) head detection is often trapped in more false positives, and 2) the performance of human detector frequently drops dramatically in crowd scenes. To handle these two issues, we present a novel joint head and human detection network, namely JointDet, which effectively detects head and human body simultaneously. Moreover, we design a head-body relationship discriminating module to perform relational learning between heads and human bodies, and leverage this learned relationship to regain the suppressed human detections and reduce head false positives. To verify the effectiveness of the proposed method, we annotate head bounding boxes of the CityPersons and Caltech-USA datasets, and conduct extensive experiments on the CrowdHuman, CityPersons and Caltech-USA datasets. As a consequence, the proposed JointDet detector achieves state-of-the-art performance on these three benchmarks. To facilitate further studies on the head and human detection problem, all new annotations, source codes and trained models will be public.
Tasks	Head Detection, Human Detection, Relational Reasoning
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10674v1
PDF	https://arxiv.org/pdf/1909.10674v1.pdf
PWC	https://paperswithcode.com/paper/relational-learning-for-joint-head-and-human
Repo
Framework

Selfie: Self-supervised Pretraining for Image Embedding


Title	Selfie: Self-supervised Pretraining for Image Embedding
Authors	Trieu H. Trinh, Minh-Thang Luong, Quoc V. Le
Abstract	We introduce a pretraining technique called Selfie, which stands for SELFie supervised Image Embedding. Selfie generalizes the concept of masked language modeling of BERT (Devlin et al., 2019) to continuous data, such as images, by making use of the Contrastive Predictive Coding loss (Oord et al., 2018). Given masked-out patches in an input image, our method learns to select the correct patch, among other “distractor” patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches. The pretraining architecture of Selfie includes a network of convolutional blocks to process patches followed by an attention pooling network to summarize the content of unmasked patches before predicting masked ones. During finetuning, we reuse the convolutional weights found by pretraining. We evaluate Selfie on three benchmarks (CIFAR-10, ImageNet 32 x 32, and ImageNet 224 x 224) with varying amounts of labeled data, from 5% to 100% of the training sets. Our pretraining method provides consistent improvements to ResNet-50 across all settings compared to the standard supervised training of the same network. Notably, on ImageNet 224 x 224 with 60 examples per class (5%), our method improves the mean accuracy of ResNet-50 from 35.6% to 46.7%, an improvement of 11.1 points in absolute accuracy. Our pretraining method also improves ResNet-50 training stability, especially on low data regime, by significantly lowering the standard deviation of test accuracies across different runs.
Tasks	Language Modelling
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02940v3
PDF	https://arxiv.org/pdf/1906.02940v3.pdf
PWC	https://paperswithcode.com/paper/selfie-self-supervised-pretraining-for-image
Repo
Framework