January 27, 2020

3136 words 15 mins read

Paper Group ANR 1138

An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications. Human-Robot Collaboration via Deep Reinforcement Learning of Real-World Interactions. 26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone. Texture CNN for Histopathological Image Classification. PAC Identi …

An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications


Title	An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications
Authors	Ryan J. Visée, Jirapat Likitlersuang, José Zariffa
Abstract	Objective: Individuals with spinal cord injury (SCI) report upper limb function as their top recovery priority. To accurately represent the true impact of new interventions on patient function and independence, evaluation should occur in a natural setting. Wearable cameras can be used to monitor hand function at home, using computer vision to automatically analyze the resulting videos (egocentric video). A key step in this process, hand detection, is difficult to do robustly and reliably, hindering deployment of a complete monitoring system in the home and community. We propose an accurate and efficient hand detection method that uses a simple combination of existing detection and tracking algorithms. Methods: Detection, tracking, and combination methods were evaluated on a new hand detection dataset, consisting of 167,622 frames of egocentric videos collected on 17 individuals with SCI performing activities of daily living in a home simulation laboratory. Results: The F1-scores for the best detector and tracker alone (SSD and Median Flow) were 0.90$\pm$0.07 and 0.42$\pm$0.18, respectively. The best combination method, in which a detector was used to initialize and reset a tracker, resulted in an F1-score of 0.87$\pm$0.07 while being two times faster than the fastest detector alone. Conclusion: The combination of the fastest detector and best tracker improved the accuracy over online trackers while improving the speed of detectors. Significance: The method proposed here, in combination with wearable cameras, will help clinicians directly measure hand function in a patient’s daily life at home, enabling independence after SCI.
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10406v1
PDF	https://arxiv.org/pdf/1908.10406v1.pdf
PWC	https://paperswithcode.com/paper/an-effective-and-efficient-method-for
Repo
Framework

Human-Robot Collaboration via Deep Reinforcement Learning of Real-World Interactions


Title	Human-Robot Collaboration via Deep Reinforcement Learning of Real-World Interactions
Authors	Jonas Tjomsland, Ali Shafti, A. Aldo Faisal
Abstract	We present a robotic setup for real-world testing and evaluation of human-robot and human-human collaborative learning. Leveraging the sample-efficiency of the Soft Actor-Critic algorithm, we have implemented a robotic platform able to learn a non-trivial collaborative task with a human partner, without pre-training in simulation, and using only 30 minutes of real-world interactions. This enables us to study Human-Robot and Human-Human collaborative learning through real-world interactions. We present preliminary results, showing that state-of-the-art deep learning methods can take human-robot collaborative learning a step closer to that of humans interacting with each other.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01715v1
PDF	https://arxiv.org/pdf/1912.01715v1.pdf
PWC	https://paperswithcode.com/paper/191201715
Repo
Framework

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone


Title	26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone
Authors	Wei Niu, Xiaolong Ma, Yanzhi Wang, Bin Ren
Abstract	With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem. However, without a careful optimization, executing Deep Neural Networks (a key building block of the real-time video stream processing that is the foundation of many popular applications) is still challenging, specifically, if an extremely low latency or high accuracy inference is needed. This work presents CADNN, a programming framework to efficiently execute DNN on mobile devices with the help of advanced model compression (sparsity) and a set of thorough architecture-aware optimization. The evaluation result demonstrates that CADNN outperforms all the state-of-the-art dense DNN execution frameworks like TensorFlow Lite and TVM.
Tasks	Model Compression
Published	2019-05-02
URL	http://arxiv.org/abs/1905.00571v1
PDF	http://arxiv.org/pdf/1905.00571v1.pdf
PWC	https://paperswithcode.com/paper/190500571
Repo
Framework

Texture CNN for Histopathological Image Classification


Title	Texture CNN for Histopathological Image Classification
Authors	Jonathan de Matos, Alceu de S. Britto Jr., Luiz E. S. de Oliveira, Alessandro L. Koerich
Abstract	Biopsies are the gold standard for breast cancer diagnosis. This task can be improved by the use of Computer Aided Diagnosis (CAD) systems, reducing the time of diagnosis and reducing the inter and intra-observer variability. The advances in computing have brought this type of system closer to reality. However, datasets of Histopathological Images (HI) from biopsies are quite small and unbalanced what makes difficult to use modern machine learning techniques such as deep learning. In this paper we propose a compact architecture based on texture filters that has fewer parameters than traditional deep models but is able to capture the difference between malignant and benign tissues with relative accuracy. The experimental results on the BreakHis dataset have show that the proposed texture CNN achieves almost 90% of accuracy for classifying benign and malignant tissues.
Tasks	Histopathological Image Classification, Image Classification
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12005v1
PDF	https://arxiv.org/pdf/1905.12005v1.pdf
PWC	https://paperswithcode.com/paper/texture-cnn-for-histopathological-image
Repo
Framework

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits


Title	PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
Authors	Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan
Abstract	We consider the problem of identifying any $k$ out of the best $m$ arms in an $n$-armed stochastic multi-armed bandit. Framed in the PAC setting, this particular problem generalises both the problem of `best subset selection' and that of selecting` one out of the best m’ arms [arcsk 2017]. In applications such as crowd-sourcing and drug-designing, identifying a single good solution is often not sufficient. Moreover, finding the best subset might be hard due to the presence of many indistinguishably close solutions. Our generalisation of identifying exactly $k$ arms out of the best $m$, where $1 \leq k \leq m$, serves as a more effective alternative. We present a lower bound on the worst-case sample complexity for general $k$, and a fully sequential PAC algorithm, \GLUCB, which is more sample-efficient on easy instances. Also, extending our analysis to infinite-armed bandits, we present a PAC algorithm that is independent of $n$, which identifies an arm from the best $\rho$ fraction of arms using at most an additive poly-log number of samples than compared to the lower bound, thereby improving over [arcsk 2017] and [Aziz+AKA:2018]. The problem of identifying $k > 1$ distinct arms from the best $\rho$ fraction is not always well-defined; for a special class of this problem, we present lower and upper bounds. Finally, through a reduction, we establish a relation between upper bounds for the `one out of the best $\rho$' problem for infinite instances and the` one out of the best $m$’ problem for finite instances. We conjecture that it is more efficient to solve `small’ finite instances using the latter formulation, rather than going through the former. \|
Tasks	Multi-Armed Bandits
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08386v1
PDF	http://arxiv.org/pdf/1901.08386v1.pdf
PWC	https://paperswithcode.com/paper/pac-identification-of-many-good-arms-in
Repo
Framework

Survival of the Fittest in PlayerUnknown BattleGround


Title	Survival of the Fittest in PlayerUnknown BattleGround
Authors	Brij Rokad, Tushar Karumudi, Omkar Acharya, Akshay Jagtap
Abstract	The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown battleground). In the game, up to one hundred players parachutes onto an island and scavenge for weapons and equipment to kill others, while avoiding getting killed themselves. The available safe area of the game map decreases in size over time, directing surviving players into tighter areas to force encounters. The last player or team standing wins the round. In this paper specifically, we have tried to predict the placement of the player in the ultimate survival test. The data set has been taken from Kaggle. Entire dataset has 29 attributes which are categories to 1 label(winPlacePerc), training set has 4.5 million instances and testing set has 1.9 million. winPlacePerc is continuous category, which makes it harder to predict the survival of the fittest. To overcome this problem, we have applied multiple machine learning models to find the optimum prediction. Model consists of LightGBM Regression (Light Gradient Boosting Machine Regression), MultiLayer Perceptron, M5P (improvement on C4.5) and Random Forest. To measure the error rate, Mean Absolute Error has been used. With the final prediction we have achieved MAE of 0.02047, 0.065, 0.0592 and 0634 respectively.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06052v1
PDF	https://arxiv.org/pdf/1905.06052v1.pdf
PWC	https://paperswithcode.com/paper/survival-of-the-fittest-in-playerunknown
Repo
Framework

Visual Illusions Also Deceive Convolutional Neural Networks: Analysis and Implications


Title	Visual Illusions Also Deceive Convolutional Neural Networks: Analysis and Implications
Authors	A. Gomez-Villa, A. Martín, J. Vazquez-Corral, M. Bertalmío, J. Malo
Abstract	Visual illusions allow researchers to devise and test new models of visual perception. Here we show that artificial neural networks trained for basic visual tasks in natural images are deceived by brightness and color illusions, having a response that is qualitatively very similar to the human achromatic and chromatic contrast sensitivity functions, and consistent with natural image statistics. We also show that, while these artificial networks are deceived by illusions, their response might be significantly different to that of humans. Our results suggest that low-level illusions appear in any system that has to perform basic visual tasks in natural environments, in line with error minimization explanations of visual function, and they also imply a word of caution on using artificial networks to study human vision, as previously suggested in other contexts in the vision science literature.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01643v1
PDF	https://arxiv.org/pdf/1912.01643v1.pdf
PWC	https://paperswithcode.com/paper/visual-illusions-also-deceive-convolutional
Repo
Framework

Weighted Dark Channel Dehazing


Title	Weighted Dark Channel Dehazing
Authors	Zhu Mingzhu, He Bingwei, Liu Jiantao
Abstract	In dark channel based methods, local constant assumption is widely used to make the algorithms invertible. It inevitably introduces defects since the assumption can not perfectly avoid depth discontinuities and meanwhile cover enough pixels. Unfortunately, because of the limitation of the prior, which only confirms the existence of dark things but does not specify their locations or likelihood, no fidelity measurement is available in refinement thus the defects are either under-corrected or over-corrected. In this paper, we go deeper than the dark channel theory to overcome this problem. We split the concept of dark channel into dark pixels and local constant assumption, and then, control the problematic assumption based on a novel weight map. With such effort, our methods show significant improvement on quality and have competitive speed. In the last, we show that the method is highly robust to initial transmission estimates and can be ever-improved by providing better dark pixel locations.
Tasks
Published	2019-04-28
URL	http://arxiv.org/abs/1904.12245v1
PDF	http://arxiv.org/pdf/1904.12245v1.pdf
PWC	https://paperswithcode.com/paper/weighted-dark-channel-dehazing
Repo
Framework

PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs


Title	PMC-GANs: Generating Multi-Scale High-Quality Pedestrian with Multimodal Cascaded GANs
Authors	Jie Wu, Ying Peng, Chenghao Zheng, Zongbo Hao, Jian Zhang
Abstract	Recently, generative adversarial networks (GANs) have shown great advantages in synthesizing images, leading to a boost of explorations of using faked images to augment data. This paper proposes a multimodal cascaded generative adversarial networks (PMC-GANs) to generate realistic and diversified pedestrian images and augment pedestrian detection data. The generator of our model applies a residual U-net structure, with multi-scale residual blocks to encode features, and attention residual blocks to help decode and rebuild pedestrian images. The model constructs in a coarse-to-fine fashion and adopts cascade structure, which is beneficial to produce high-resolution pedestrians. PMC-GANs outperforms baselines, and when used for data augmentation, it improves pedestrian detection results.
Tasks	Data Augmentation, Pedestrian Detection
Published	2019-12-30
URL	https://arxiv.org/abs/1912.12799v1
PDF	https://arxiv.org/pdf/1912.12799v1.pdf
PWC	https://paperswithcode.com/paper/pmc-gans-generating-multi-scale-high-quality
Repo
Framework

Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade


Title	Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade
Authors	Tony Lindeberg
Abstract	This article presents a theory for constructing hierarchical networks in such a way that the networks are guaranteed to be provably scale covariant. We first present a general sufficiency argument for obtaining scale covariance, which holds for a wide class of networks defined from linear and non-linear differential expressions expressed in terms of scale-normalized scale-space derivatives. Then, we present a more detailed development of one example of such a network constructed from a combination of mathematically derived models of receptive fields and biologically inspired computations. Based on a functional model of complex cells in terms of an oriented quasi quadrature combination of first- and second-order directional Gaussian derivatives, we couple such primitive computations in cascade over combinatorial expansions over image orientations. Scale-space properties of the computational primitives are analysed and we give explicit proofs of how the resulting representation allows for scale and rotation covariance. A prototype application to texture analysis is developed and it is demonstrated that a simplified mean-reduced representation of the resulting QuasiQuadNet leads to promising experimental results on three texture datasets.
Tasks	Texture Classification
Published	2019-05-29
URL	https://arxiv.org/abs/1905.13555v3
PDF	https://arxiv.org/pdf/1905.13555v3.pdf
PWC	https://paperswithcode.com/paper/provably-scale-covariant-hierarchical
Repo
Framework

DS-VIO: Robust and Efficient Stereo Visual Inertial Odometry based on Dual Stage EKF


Title	DS-VIO: Robust and Efficient Stereo Visual Inertial Odometry based on Dual Stage EKF
Authors	Xiaogang Xiong, Wenqing Chen, Zhichao Liu, Qiang Shen
Abstract	This paper presents a dual stage EKF (Extended Kalman Filter)-based algorithm for the real-time and robust stereo VIO (visual inertial odometry). The first stage of this EKF-based algorithm performs the fusion of accelerometer and gyroscope while the second performs the fusion of stereo camera and IMU. Due to the sufficient complementary characteristics between accelerometer and gyroscope as well as stereo camera and IMU, the dual stage EKF-based algorithm can achieve a high precision of odometry estimations. At the same time, because of the low dimension of state vector in this algorithm, its computational efficiency is comparable to previous filter-based approaches. We call our approach DS-VIO (dual stage EKFbased stereo visual inertial odometry) and evaluate our DSVIO algorithm by comparing it with the state-of-art approaches including OKVIS, ROVIO, VINS-MONO and S-MSCKF on the EuRoC dataset. Results show that our algorithm can achieve comparable or even better performances in terms of the RMS error
Tasks
Published	2019-05-02
URL	http://arxiv.org/abs/1905.00684v1
PDF	http://arxiv.org/pdf/1905.00684v1.pdf
PWC	https://paperswithcode.com/paper/ds-vio-robust-and-efficient-stereo-visual
Repo
Framework

Anomalous Situation Detection in Complex Scenes


Title	Anomalous Situation Detection in Complex Scenes
Authors	Michalis Voutouris, Giovanni Sachi, Hina Afridi
Abstract	In this paper we investigate a robust method to identify anomalies in complex scenes. This task is performed by evaluating the collective behavior by extracting the local binary patterns (LBP) and Laplacian of Gaussian (LoG) features. We fuse both features together which are exploited to train an MLP neural network during the training stage, and the anomaly is identified on the test samples. Considering the challenge of tracking individuals in dense crowded scenes due to multiple occlusions and clutter, in this paper we extract LBP and LoG features and use them as an approximate representation of the anomalous situation. These features well match the appearance of anomaly and their consistency, and accuracy is higher both in regular and irregular areas compared to other descriptors. In this paper, these features are exploited as input prior to train the neural network. The MLP neural network is subsequently explored to consider these features that can detect the anomalous situation. The experimental tests are conducted on a set of benchmark video sequences commonly used for anomaly situation detection.
Tasks
Published	2019-02-26
URL	http://arxiv.org/abs/1902.10016v1
PDF	http://arxiv.org/pdf/1902.10016v1.pdf
PWC	https://paperswithcode.com/paper/anomalous-situation-detection-in-complex
Repo
Framework

On the rate of convergence of fully connected very deep neural network regression estimates


Title	On the rate of convergence of fully connected very deep neural network regression estimates
Authors	Michael Kohler, Sophie Langer
Abstract	Recent results in nonparametric regression show that deep learning, i.e., neural networks estimates with many hidden layers, are able to circumvent the so-called curse of dimensionality in case that suitable restrictions on the structure of the regression function hold. One key feature of the neural networks used in these results is that they are not fully connected. In this paper we show that we can get similar results also for fully connected multilayer feedforward neural networks with ReLU activation functions, provided the number of neurons per hidden layer is fixed and the number of hidden layers tends to infinity for sample size tending to infinity. The proof is based on new approximation results concerning fully connected deep neural networks.
Tasks
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11133v1
PDF	https://arxiv.org/pdf/1908.11133v1.pdf
PWC	https://paperswithcode.com/paper/on-the-rate-of-convergence-of-fully-connected
Repo
Framework

EMAP: Explanation by Minimal Adversarial Perturbation


Title	EMAP: Explanation by Minimal Adversarial Perturbation
Authors	Matt Chapman-Rounds, Marc-Andre Schulz, Erik Pazos, Konstantinos Georgatzis
Abstract	Modern instance-based model-agnostic explanation methods (LIME, SHAP, L2X) are of great use in data-heavy industries for model diagnostics, and for end-user explanations. These methods generally return either a weighting or subset of input features as an explanation of the classification of an instance. An alternative literature argues instead that counterfactual instances provide a more useable characterisation of a black box classifier’s decisions. We present EMAP, a neural network based approach which returns as Explanation the Minimal Adversarial Perturbation to an instance required to cause the underlying black box model to missclassify. We show that this approach combines the two paradigms, recovering the output of feature-weighting methods in continuous feature spaces, whilst also indicating the direction in which the nearest counterfactuals can be found. Our method also provides an implicit confidence estimate in its own explanations, adding a clarity to model diagnostics other methods lack. Additionally, EMAP improves upon the speed of sampling-based methods such as LIME by an order of magnitude, allowing for model explanations in time-critical applications, or at the dataset level, where sampling-based methods are infeasible. We extend our approach to categorical features using a partitioned Gumbel layer, and demonstrate its efficacy on several standard datasets.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00872v1
PDF	https://arxiv.org/pdf/1912.00872v1.pdf
PWC	https://paperswithcode.com/paper/emap-explanation-by-minimal-adversarial
Repo
Framework

Unsupervised Learning of Depth and Deep Representation for Visual Odometry from Monocular Videos in a Metric Space


Title	Unsupervised Learning of Depth and Deep Representation for Visual Odometry from Monocular Videos in a Metric Space
Authors	Xiaochuan Yin, Chengju Liu
Abstract	For ego-motion estimation, the feature representation of the scenes is crucial. Previous methods indicate that both the low-level and semantic feature-based methods can achieve promising results. Therefore, the incorporation of hierarchical feature representation may benefit from both methods. From this perspective, we propose a novel direct feature odometry framework, named DFO, for depth estimation and hierarchical feature representation learning from monocular videos. By exploiting the metric distance, our framework is able to learn the hierarchical feature representation without supervision. The pose is obtained with a coarse-to-fine approach from high-level to low-level features in enlarged feature maps. The pixel-level attention mask can be self-learned to provide the prior information. In contrast to the previous methods, our proposed method calculates the camera motion with a direct method rather than regressing the ego-motion from the pose network. With this approach, the consistency of the scale factor of translation can be constrained. Additionally, the proposed method is thus compatible with the traditional SLAM pipeline. Experiments on the KITTI dataset demonstrate the effectiveness of our method.
Tasks	Depth Estimation, Motion Estimation, Representation Learning, Text-to-Image Generation, Visual Odometry
Published	2019-08-04
URL	https://arxiv.org/abs/1908.01367v1
PDF	https://arxiv.org/pdf/1908.01367v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-depth-and-deep
Repo
Framework