October 19, 2019

3402 words 16 mins read

Paper Group ANR 263

Online Collective Animal Movement Activity Recognition. A Study on Deep Learning Based Sauvegrain Method for Measurement of Puberty Bone Age. LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering. Variational based Mixed Noise Removal with CNN Deep Learning Regularization. A Multi-Layer Approach to Superpixel-based Higher-ord …

Online Collective Animal Movement Activity Recognition


Title	Online Collective Animal Movement Activity Recognition
Authors	Kehinde Owoeye, Stephen Hailes
Abstract	Learning the activities of animals is important for the purpose of monitoring their welfare vis a vis their behaviour with respect to their environment and conspecifics. While previous works have largely focused on activity recognition in a single animal, little or no work has been done in learning the collective behaviour of animals. In this work, we address the problem of recognising the collective movement activities of a group of sheep in a flock. We present a discriminative framework that learns to track the positions and velocities of all the animals in the flock in an online manner whilst estimating their collective activity. We investigate the performance of two simple deep network architectures and show that we can learn the collective activities with good accuracy even when the distribution of the activities is skewed.
Tasks	Activity Recognition
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09067v1
PDF	http://arxiv.org/pdf/1811.09067v1.pdf
PWC	https://paperswithcode.com/paper/online-collective-animal-movement-activity
Repo
Framework

A Study on Deep Learning Based Sauvegrain Method for Measurement of Puberty Bone Age


Title	A Study on Deep Learning Based Sauvegrain Method for Measurement of Puberty Bone Age
Authors	Seung Bin Baik, Keum Gang Cha
Abstract	This study applies a technique to expand the number of images to a level that allows deep learning. And the applicability of the Sauvegrain method through deep learning with relatively few elbow X-rays is studied. The study was composed of processes similar to the physicians’ bone age assessment procedures. The selected reference images were learned without being included in the evaluation data, and at the same time, the data was extended to accommodate the number of cases. In addition, we adjusted the X-ray images to better images using U-Net and selected the ROI with RPN + so as to be able to perform bone age estimation through CNN. The mean absolute error of the Sauvegrain method based on deep learning is 2.8 months and the Mean Absolute Percentage Error (MAPE) is 0.018. This result shows that X - ray analysis using the Sauvegrain method shows higher accuracy than that of the age group of puberty even in the deep learning base. This means that deep learning of the Suvegrain method can be measured at a level similar to that of an expert, based on the extended X-ray image with the image data extension technique. Finally, we applied the Sauvegrain method to deep learning for accurate measurement of bone age at puberty. As a result, the present study is based on deep learning, and compared with the evaluation results of experts, it is possible to overcome limitations of the method of measuring bone age based on machine learning which was in TW3 or Greulich & Pyle due to lack of X- I confirmed the fact. And we also presented the Sauvegrain method, which is applicable to adolescents as well.
Tasks	Age Estimation
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06965v1
PDF	http://arxiv.org/pdf/1809.06965v1.pdf
PWC	https://paperswithcode.com/paper/a-study-on-deep-learning-based-sauvegrain
Repo
Framework

LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering


Title	LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering
Authors	Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, Adarsh Kowdle, Christoph Rhemann, Dan B Goldman, Cem Keskin, Steve Seitz, Shahram Izadi, Sean Fanello
Abstract	Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time. We call this approach neural (re-)rendering, and our live system “LookinGood”. Our deep architecture is trained to produce high resolution and high quality images from a coarse rendering in real-time. First, we propose a self-supervised training method that does not require manual ground-truth annotation. We contribute a specialized reconstruction error that uses semantic information to focus on relevant parts of the subject, e.g. the face. We also introduce a salient reweighing scheme of the loss function that is able to discard outliers. We specifically design the system for virtual and augmented reality headsets where the consistency between the left and right eye plays a crucial role in the final user experience. Finally, we generate temporally stable results by explicitly minimizing the difference between two consecutive frames. We tested the proposed system in two different scenarios: one involving a single RGB-D sensor, and upper body reconstruction of an actor, the second consisting of full body 360 degree capture. Through extensive experimentation, we demonstrate how our system generalizes across unseen sequences and subjects. The supplementary video is available at http://youtu.be/Md3tdAKoLGU.
Tasks	Denoising, Super-Resolution
Published	2018-11-12
URL	http://arxiv.org/abs/1811.05029v1
PDF	http://arxiv.org/pdf/1811.05029v1.pdf
PWC	https://paperswithcode.com/paper/lookingood-enhancing-performance-capture-with
Repo
Framework

Variational based Mixed Noise Removal with CNN Deep Learning Regularization


Title	Variational based Mixed Noise Removal with CNN Deep Learning Regularization
Authors	Faqiang Wang, Haiyang Huang, Jun Liu
Abstract	In this paper, the traditional model based variational method and learning based algorithms are naturally integrated to address mixed noise removal problem. To be different from single type noise (e.g. Gaussian) removal, it is a challenge problem to accurately discriminate noise types and levels for each pixel. We propose a variational method to iteratively estimate the noise parameters, and then the algorithm can automatically classify the noise according to the different statistical parameters. The proposed variational problem can be separated into regularization, synthesis, parameter estimation and noise classification four steps with the operator splitting scheme. Each step is related to an optimization subproblem. To enforce the regularization, the deep learning method is employed to learn the natural images priori. Compared with some model based regularizations, the CNN regularizer can significantly improve the quality of the restored images. Compared with some learning based methods, the synthesis step can produce better reconstructions by analyzing the recognized noise types and levels. In our method, the convolution neutral network (CNN) can be regarded as an operator which associated to a variational functional. From this viewpoint, the proposed method can be extended to many image reconstruction and inverse problems. Numerical experiments in the paper show that our method can achieve some state-of-the-art results for mixed noise removal.
Tasks	Image Reconstruction
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08094v1
PDF	http://arxiv.org/pdf/1805.08094v1.pdf
PWC	https://paperswithcode.com/paper/variational-based-mixed-noise-removal-with
Repo
Framework

A Multi-Layer Approach to Superpixel-based Higher-order Conditional Random Field for Semantic Image Segmentation


Title	A Multi-Layer Approach to Superpixel-based Higher-order Conditional Random Field for Semantic Image Segmentation
Authors	Li Sulimowicz, Ishfaq Ahmad, Alexander Aved
Abstract	Superpixel-based Higher-order Conditional random fields (SP-HO-CRFs) are known for their effectiveness in enforcing both short and long spatial contiguity for pixelwise labelling in computer vision. However, their higher-order potentials are usually too complex to learn and often incur a high computational cost in performing inference. We propose an new approximation approach to SP-HO-CRFs that resolves these problems. Our approach is a multi-layer CRF framework that inherits the simplicity from pairwise CRFs by formulating both the higher-order and pairwise cues into the same pairwise potentials in the first layer. Essentially, this approach provides accuracy enhancement on the basis of pairwise CRFs without training by reusing their pre-trained parameters and/or weights. The proposed multi-layer approach performs especially well in delineating the boundary details (boarders) of object categories such as “trees” and “bushes”. Multiple sets of experiments conducted on dataset MSRC-21 and PASCAL VOC 2012 validate the effectiveness and efficiency of the proposed methods.
Tasks	Semantic Segmentation
Published	2018-04-05
URL	http://arxiv.org/abs/1804.02032v1
PDF	http://arxiv.org/pdf/1804.02032v1.pdf
PWC	https://paperswithcode.com/paper/a-multi-layer-approach-to-superpixel-based
Repo
Framework

Asymmetric Bilateral Phase Correlation for Optical Flow Estimation in the Frequency Domain


Title	Asymmetric Bilateral Phase Correlation for Optical Flow Estimation in the Frequency Domain
Authors	Vasileios Argyriou
Abstract	We address the problem of motion estimation in images operating in the frequency domain. A method is presented which extends phase correlation to handle multiple motions present in an area. Our scheme is based on a novel Bilateral-Phase Correlation (BLPC) technique that incorporates the concept and principles of Bilateral Filters retaining the motion boundaries by taking into account the difference both in value and distance in a manner very similar to Gaussian convolution. The optical flow is obtained by applying the proposed method at certain locations selected based on the present motion differences and then performing non-uniform interpolation in a multi-scale iterative framework. Experiments with several well-known datasets with and without ground-truth show that our scheme outperforms recently proposed state-of-the-art phase correlation based optical flow methods.
Tasks	Motion Estimation, Optical Flow Estimation
Published	2018-11-01
URL	http://arxiv.org/abs/1811.00327v1
PDF	http://arxiv.org/pdf/1811.00327v1.pdf
PWC	https://paperswithcode.com/paper/asymmetric-bilateral-phase-correlation-for
Repo
Framework

Optimized Gated Deep Learning Architectures for Sensor Fusion


Title	Optimized Gated Deep Learning Architectures for Sensor Fusion
Authors	Myung Seok Shim, Peng Li
Abstract	Sensor fusion is a key technology that integrates various sensory inputs to allow for robust decision making in many applications such as autonomous driving and robot control. Deep neural networks have been adopted for sensor fusion in a body of recent studies. Among these, the so-called netgated architecture was proposed, which has demonstrated improved performances over the conventional convolutional neural networks (CNN). In this paper, we address several limitations of the baseline negated architecture by proposing two further optimized architectures: a coarser-grained gated architecture employing (feature) group-level fusion weights and a two-stage gated architectures leveraging both the group-level and feature level fusion weights. Using driving mode prediction and human activity recognition datasets, we demonstrate the significant performance improvements brought by the proposed gated architectures and also their robustness in the presence of sensor noise and failures.
Tasks	Activity Recognition, Autonomous Driving, Decision Making, Human Activity Recognition, Sensor Fusion
Published	2018-10-08
URL	http://arxiv.org/abs/1810.04160v1
PDF	http://arxiv.org/pdf/1810.04160v1.pdf
PWC	https://paperswithcode.com/paper/optimized-gated-deep-learning-architectures
Repo
Framework

A Differential Volumetric Approach to Multi-View Photometric Stereo


Title	A Differential Volumetric Approach to Multi-View Photometric Stereo
Authors	Fotios Logothetis, Roberto Mecca, Roberto Cipolla
Abstract	Highly accurate 3D volumetric reconstruction is still an open research topic where the main difficulty is usually related to merging some rough estimations with high frequency details. One of the most promising methods is the fusion between multi-view stereo and photometric stereo images. Beside the intrinsic difficulties that multi-view stereo and photometric stereo in order to work reliably, supplementary problems arise when considered together. In this work, we present a volumetric approach to the multi-view photometric stereo problem. The key point of our method is the signed distance field parameterisation and its relation to the surface normal. This is exploited in order to obtain a linear partial differential equation which is solved in a variational framework, that combines multiple images from multiple points of view in a single system. In addition, the volumetric approach is naturally implemented on an octree, which allows for fast ray-tracing that reliably alleviates occlusions and cast shadows. Our approach is evaluated on synthetic and real data-sets and achieves state-of-the-art results.
Tasks	3D Volumetric Reconstruction
Published	2018-11-05
URL	https://arxiv.org/abs/1811.01984v2
PDF	https://arxiv.org/pdf/1811.01984v2.pdf
PWC	https://paperswithcode.com/paper/a-differential-volumetric-approach-to-multi
Repo
Framework

Optimal localist and distributed coding of spatiotemporal spike patterns through STDP and coincidence detection


Title	Optimal localist and distributed coding of spatiotemporal spike patterns through STDP and coincidence detection
Authors	Timothée Masquelier, Saeed Reza Kheradpisheh
Abstract	Repeating spatiotemporal spike patterns exist and carry information. Here we investigated how a single spiking neuron can optimally respond to one given pattern (localist coding), or to either one of several patterns (distributed coding, i.e. the neuron’s response is ambiguous but the identity of the pattern could be inferred from the response of multiple neurons), but not to random inputs. To do so, we extended a theory developed in a previous paper [Masquelier, 2017], which was limited to localist coding. More specifically, we computed analytically the signal-to-noise ratio (SNR) of a multi-pattern-detector neuron, using a threshold-free leaky integrate-and-fire (LIF) neuron model with non-plastic unitary synapses and homogeneous Poisson inputs. Surprisingly, when increasing the number of patterns, the SNR decreases slowly, and remains acceptable for several tens of independent patterns. In addition, we investigated whether spike-timing-dependent plasticity (STDP) could enable a neuron to reach the theoretical optimal SNR. To this aim, we simulated a LIF equipped with STDP, and repeatedly exposed it to multiple input spike patterns, embedded in equally dense Poisson spike trains. The LIF progressively became selective to every repeating pattern with no supervision, and stopped discharging during the Poisson spike trains. Furthermore, using certain STDP parameters, the resulting pattern detectors were optimal. Tens of independent patterns could be learned by a single neuron using a low adaptive threshold, in contrast with previous studies, in which higher thresholds led to localist coding only. Taken together these results suggest that coincidence detection and STDP are powerful mechanisms, fully compatible with distributed coding. Yet we acknowledge that our theory is limited to single neurons, and thus also applies to feed-forward networks, but not to recurrent ones.
Tasks
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00447v4
PDF	http://arxiv.org/pdf/1803.00447v4.pdf
PWC	https://paperswithcode.com/paper/optimal-localist-and-distributed-coding-of
Repo
Framework

A Deeper Insight into the UnDEMoN: Unsupervised Deep Network for Depth and Ego-Motion Estimation


Title	A Deeper Insight into the UnDEMoN: Unsupervised Deep Network for Depth and Ego-Motion Estimation
Authors	Madhu Babu V, Anima Majumder, Kaushik Das, Swagat Kumar
Abstract	This paper presents an unsupervised deep learning framework called UnDEMoN for estimating dense depth map and 6-DoF camera pose information directly from monocular images. The proposed network is trained using unlabeled monocular stereo image pairs and is shown to provide superior performance in depth and ego-motion estimation compared to the existing state-of-the-art. These improvements are achieved by introducing a new objective function that aims to minimize spatial as well as temporal reconstruction losses simultaneously. These losses are defined using bi-linear sampling kernel and penalized using the Charbonnier penalty function. The objective function, thus created, provides robustness to image gradient noises thereby improving the overall estimation accuracy without resorting to any coarse to fine strategies which are currently prevalent in the literature. Another novelty lies in the fact that we combine a disparity-based depth estimation network with a pose estimation network to obtain absolute scale-aware 6 DOF Camera pose and superior depth map. The effectiveness of the proposed approach is demonstrated through performance comparison with the existing supervised and unsupervised methods on the KITTI driving dataset.
Tasks	Depth Estimation, Motion Estimation, Pose Estimation
Published	2018-08-27
URL	http://arxiv.org/abs/1809.00969v3
PDF	http://arxiv.org/pdf/1809.00969v3.pdf
PWC	https://paperswithcode.com/paper/a-deeper-insight-into-the-undemon
Repo
Framework

Real-Time 2D-3D Deformable Registration with Deep Learning and Application to Lung Radiotherapy Targeting


Title	Real-Time 2D-3D Deformable Registration with Deep Learning and Application to Lung Radiotherapy Targeting
Authors	Markus D. Foote, Blake E. Zimmerman, Amit Sawant, Sarang Joshi
Abstract	Radiation therapy presents a need for dynamic tracking of a target tumor volume. Fiducial markers such as implanted gold seeds have been used to gate radiation delivery but the markers are invasive and gating significantly increases treatment time. Pretreatment acquisition of a respiratory correlated 4DCT allows for determination of accurate motion tracking which is useful in treatment planning. We design a patient-specific motion subspace and a deep convolutional neural network to recover anatomical positions from a single fluoroscopic projection in real-time. We use this deep network to approximate the nonlinear inverse of a diffeomorphic deformation composed with radiographic projection. This network recovers subspace coordinates to define the patient-specific deformation of the lungs from a baseline anatomic position. The geometric accuracy of the subspace deformations on real patient data is similar to accuracy attained by original image registration between individual respiratory-phase image volumes.
Tasks	Image Registration, Motion Estimation
Published	2018-07-22
URL	https://arxiv.org/abs/1807.08388v2
PDF	https://arxiv.org/pdf/1807.08388v2.pdf
PWC	https://paperswithcode.com/paper/real-time-patient-specific-lung-radiotherapy
Repo
Framework

Accurate Detection of Inner Ears in Head CTs Using a Deep Volume-to-Volume Regression Network with False Positive Suppression and a Shape-Based Constraint


Title	Accurate Detection of Inner Ears in Head CTs Using a Deep Volume-to-Volume Regression Network with False Positive Suppression and a Shape-Based Constraint
Authors	Dongqing Zhang, Jianing Wang, Jack H. Noble, Benoit M. Dawant
Abstract	Cochlear implants (CIs) are neural prosthetics which are used to treat patients with hearing loss. CIs use an array of electrodes which are surgically inserted into the cochlea to stimulate the auditory nerve endings. After surgery, CIs need to be programmed. Studies have shown that the spatial relationship between the intra-cochlear anatomy and electrodes derived from medical images can guide CI programming and lead to significant improvement in hearing outcomes. However, clinical head CT images are usually obtained from scanners of different brands with different protocols. The field of view thus varies greatly and visual inspection is needed to document their content prior to applying algorithms for electrode localization and intra-cochlear anatomy segmentation. In this work, to determine the presence/absence of inner ears and to accurately localize them in head CTs, we use a volume-to-volume convolutional neural network which can be trained end-to-end to map a raw CT volume to probability maps which indicate inner ear positions. We incorporate a false positive suppression strategy in training and apply a shape-based constraint. We achieve a labeling accuracy of 98.59% and a localization error of 2.45mm. The localization error is significantly smaller than a random forest-based approach that has been proposed recently to perform the same task.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04725v1
PDF	http://arxiv.org/pdf/1806.04725v1.pdf
PWC	https://paperswithcode.com/paper/accurate-detection-of-inner-ears-in-head-cts
Repo
Framework

Story Generation and Aviation Incident Representation


Title	Story Generation and Aviation Incident Representation
Authors	Peter Clark
Abstract	This working note discusses the topic of story generation, with a view to identifying the knowledge required to understand aviation incident narratives (which have structural similarities to stories), following the premise that to understand aviation incidents, one should at least be able to generate examples of them. We give a brief overview of aviation incidents and their relation to stories, and then describe two of our earlier attempts (using `scripts' and` story grammars’) at incident generation which did not evolve promisingly. Following this, we describe a simple incident generator which did work (at a `toy' level), using a` world simulation’ approach. This generator is based on Meehan’s TALE-SPIN story generator (1977). We conclude with a critique of the approach.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04818v1
PDF	http://arxiv.org/pdf/1802.04818v1.pdf
PWC	https://paperswithcode.com/paper/story-generation-and-aviation-incident
Repo
Framework

Large-scale Land Cover Classification in GaoFen-2 Satellite Imagery


Title	Large-scale Land Cover Classification in GaoFen-2 Satellite Imagery
Authors	Xin-Yi Tong, Qikai Lu, Gui-Song Xia, Liangpei Zhang
Abstract	Many significant applications need land cover information of remote sensing images that are acquired from different areas and times, such as change detection and disaster monitoring. However, it is difficult to find a generic land cover classification scheme for different remote sensing images due to the spectral shift caused by diverse acquisition condition. In this paper, we develop a novel land cover classification method that can deal with large-scale data captured from widely distributed areas and different times. Additionally, we establish a large-scale land cover classification dataset consisting of 150 Gaofen-2 imageries as data support for model training and performance evaluation. Our experiments achieve outstanding classification accuracy compared with traditional methods.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.00901v1
PDF	http://arxiv.org/pdf/1806.00901v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-land-cover-classification-in
Repo
Framework

Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks


Title	Learning and Recognizing Human Action from Skeleton Movement with Deep Residual Neural Networks
Authors	Huy-Hieu Pham, Louahdi Khoudour, Alain Crouzil, Pablo Zegers, Sergio A. Velastin
Abstract	Automatic human action recognition is indispensable for almost artificial intelligent systems such as video surveillance, human-computer interfaces, video retrieval, etc. Despite a lot of progress, recognizing actions in an unknown video is still a challenging task in computer vision. Recently, deep learning algorithms have proved its great potential in many vision-related recognition tasks. In this paper, we propose the use of Deep Residual Neural Networks (ResNets) to learn and recognize human action from skeleton data provided by Kinect sensor. Firstly, the body joint coordinates are transformed into 3D-arrays and saved in RGB images space. Five different deep learning models based on ResNet have been designed to extract image features and classify them into classes. Experiments are conducted on two public video datasets for human action recognition containing various challenges. The results show that our method achieves the state-of-the-art performance comparing with existing approaches.
Tasks	Temporal Action Localization, Video Retrieval
Published	2018-03-21
URL	http://arxiv.org/abs/1803.07780v1
PDF	http://arxiv.org/pdf/1803.07780v1.pdf
PWC	https://paperswithcode.com/paper/learning-and-recognizing-human-action-from
Repo
Framework