January 31, 2020

3463 words 17 mins read

Paper Group AWR 373

Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation. Temporal Convolution for Real-time Keyword Spotting on Mobile Devices. The Spectral Bias of the Deep Image Prior. Differentiable Physics-informed Graph Networks. ViP: Video Platform for PyTorch. Software architecture for YOLO, a creativi …


Title	Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation
Authors	Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra
Abstract	Does progress in simulation translate to progress in robotics? Specifically, if method A outperforms method B in simulation, how likely is the trend to hold in reality on a robot? We examine this question for embodied (PointGoal) navigation, developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity, revealing surprising findings about prior work. First, we develop Habitat-PyRobot Bridge (HaPy), a library for seamless execution of identical code on a simulated agent and a physical robot. Habitat-to-Locobot transfer with HaPy involves just one line change in config, essentially treating reality as just another simulator! Second, we investigate sim2real predictivity of Habitat-Sim for PointGoal navigation. We 3D-scan a physical lab space to create a virtualized replica, and run parallel tests of 9 different models in reality and simulation. We present a new metric called Sim-vs-Real Correlation Coefficient (SRCC) to quantify sim2real predictivity. Our analysis reveals several important findings. We find that SRCC for Habitat as used for the CVPR19 challenge is low (0.18 for the success metric), which suggests that performance improvements for this simulator-based challenge would not transfer well to a physical robot. We find that this gap is largely due to AI agents learning to ‘cheat’ by exploiting simulator imperfections: specifically, the way Habitat allows for ‘sliding’ along walls on collision. Essentially, the virtual robot is capable of cutting corners, leading to unrealistic shortcuts through non-navigable spaces. Naturally, such exploits do not work in the real world where the robot stops on contact with walls. Our experiments show that it is possible to optimize simulation parameters to enable robots trained in imperfect simulators to generalize learned skills to reality (e.g. improving $SRCC_{Succ}$ from 0.18 to 0.844).
Tasks	PointGoal Navigation, Visual Navigation
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06321v1
PDF	https://arxiv.org/pdf/1912.06321v1.pdf
PWC	https://paperswithcode.com/paper/are-we-making-real-progress-in-simulated
Repo	https://github.com/facebookresearch/habitat-api
Framework	none

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices


Title	Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Authors	Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha
Abstract	Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than \textbf{385x} speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.
Tasks	Keyword Spotting
Published	2019-04-08
URL	https://arxiv.org/abs/1904.03814v2
PDF	https://arxiv.org/pdf/1904.03814v2.pdf
PWC	https://paperswithcode.com/paper/temporal-convolution-for-real-time-keyword
Repo	https://github.com/hyperconnect/TC-ResNet
Framework	tf

The Spectral Bias of the Deep Image Prior


Title	The Spectral Bias of the Deep Image Prior
Authors	Prithvijit Chakrabarty, Subhransu Maji
Abstract	The “deep image prior” proposed by Ulyanov et al. is an intriguing property of neural nets: a convolutional encoder-decoder network can be used as a prior for natural images. The network architecture implicitly introduces a bias; If we train the model to map white noise to a corrupted image, this bias guides the model to fit the true image before fitting the corrupted regions. This paper explores why the deep image prior helps in denoising natural images. We present a novel method to analyze trajectories generated by the deep image prior optimization and demonstrate: (i) convolution layers of the an encoder-decoder decouple the frequency components of the image, learning each at different rates (ii) the model fits lower frequencies first, making early stopping behave as a low pass filter. The experiments study an extension of Cheng et al which showed that at initialization, the deep image prior is equivalent to a stationary Gaussian process.
Tasks	Denoising
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08905v1
PDF	https://arxiv.org/pdf/1912.08905v1.pdf
PWC	https://paperswithcode.com/paper/the-spectral-bias-of-the-deep-image-prior
Repo	https://github.com/PCJohn/dip-spectral
Framework	pytorch

Differentiable Physics-informed Graph Networks


Title	Differentiable Physics-informed Graph Networks
Authors	Sungyong Seo, Yan Liu
Abstract	While physics conveys knowledge of nature built from an interplay between observations and theory, it has been considered less importantly in deep neural networks. Especially, there are few works leveraging physics behaviors when the knowledge is given less explicitly. In this work, we propose a novel architecture called Differentiable Physics-informed Graph Networks (DPGN) to incorporate implicit physics knowledge which is given from domain experts by informing it in latent space. Using the concept of DPGN, we demonstrate that climate prediction tasks are significantly improved. Besides the experiment results, we validate the effectiveness of the proposed module and provide further applications of DPGN, such as inductive learning and multistep predictions.
Tasks
Published	2019-02-08
URL	http://arxiv.org/abs/1902.02950v2
PDF	http://arxiv.org/pdf/1902.02950v2.pdf
PWC	https://paperswithcode.com/paper/differentiable-physics-informed-graph
Repo	https://github.com/sungyongs/dpgn
Framework	pytorch

ViP: Video Platform for PyTorch


Title	ViP: Video Platform for PyTorch
Authors	Madan Ravi Ganesh, Eric Hofesmann, Nathan Louis, Jason Corso
Abstract	This work presents the Video Platform for PyTorch (ViP), a deep learning-based framework designed to handle and extend to any problem domain based on videos. ViP supports (1) a single unified interface applicable to all video problem domains, (2) quick prototyping of video models, (3) executing large-batch operations with reduced memory consumption, and (4) easy and reproducible experimental setups. ViP’s core functionality is built with flexibility and modularity in mind to allow for smooth data flow between different parts of the platform and benchmarking against existing methods. In providing a software platform that supports multiple video-based problem domains, we allow for more cross-pollination of models, ideas and stronger generalization in the video understanding research community.
Tasks	Video Understanding
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02793v1
PDF	https://arxiv.org/pdf/1910.02793v1.pdf
PWC	https://paperswithcode.com/paper/vip-video-platform-for-pytorch
Repo	https://github.com/MichiganCOG/ViP
Framework	pytorch

Software architecture for YOLO, a creativity-stimulating robot


Title	Software architecture for YOLO, a creativity-stimulating robot
Authors	Patrícia Alves-Oliveira, Samuel Gomes, Ankita Chandak, Patrícia Arriaga, Guy Hoffman, Ana Paiva
Abstract	YOLO is a social robot designed and developed to stimulate creativity in children through storytelling activities. Children use it as a character in their stories. This article details the artificial intelligence software developed for YOLO. The implemented software schedules through several Creativity Behaviors to find the ones that stimulate creativity more effectively. YOLO can choose between convergent and divergent thinking techniques, two important processes of creative thought. These techniques were developed based on the psychological theories of creativity development and on research from creativity experts who work with children. Additionally, this software allows the creation of Social Behaviors that enable the robot to behave as a believable character. On top of our framework, we built 3 main social behavior parameters: Exuberant, Aloof, and Harmonious. These behaviors are meant to ease immersive play and the process of character creation. The 3 social behaviors were based on psychological theories of personality and developed using children’s input during co-design studies. Overall, this work presents an attempt to design, develop, and deploy social robots that nurture intrinsic human abilities, such as the ability to be creative.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10823v1
PDF	https://arxiv.org/pdf/1909.10823v1.pdf
PWC	https://paperswithcode.com/paper/software-architecture-for-yolo-a-creativity
Repo	https://github.com/patricialvesoliveira/YOLO-Software
Framework	none

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning


Title	CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
Authors	Rohit Girdhar, Deva Ramanan
Abstract	Computer vision has undergone a dramatic revolution in performance, driven in large part through deep features trained on large-scale supervised datasets. However, much of these improvements have focused on static image analysis; video understanding has seen rather modest improvements. Even though new datasets and spatiotemporal models have been proposed, simple frame-by-frame classification methods often still remain competitive. We posit that current video datasets are plagued with implicit biases over scene and object structure that can dwarf variations in temporal structure. In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved. Our dataset, named CATER, is rendered synthetically using a library of standard 3D objects, and tests the ability to recognize compositions of object movements that require long-term reasoning. In addition to being a challenging dataset, CATER also provides a plethora of diagnostic tools to analyze modern spatiotemporal video architectures by being completely observable and controllable. Using CATER, we provide insights into some of the most recent state of the art deep video architectures.
Tasks	Video Understanding
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04744v1
PDF	https://arxiv.org/pdf/1910.04744v1.pdf
PWC	https://paperswithcode.com/paper/cater-a-diagnostic-dataset-for-compositional
Repo	https://github.com/rohitgirdhar/CATER
Framework	caffe2

Evaluation of Embeddings of Laboratory Test Codes for Patients at a Cancer Center


Title	Evaluation of Embeddings of Laboratory Test Codes for Patients at a Cancer Center
Authors	Lorenzo A. Rossi, Chad Shawber, Janet Munu, Finly Zachariah
Abstract	Laboratory test results are an important and generally high dimensional component of a patient’s Electronic Health Record (EHR). We train embedding representations (via Word2Vec and GloVe) for LOINC codes of laboratory tests from the EHRs of about 80,000 patients at a cancer center. To include information about lab test outcomes, we also train embeddings on the concatenation of a LOINC code with a symbol indicating normality or abnormality of the result. We observe several clinically meaningful similarities among LOINC embeddings trained over our data. For the embeddings of the concatenation of LOINCs with abnormality codes, we evaluate the performance for mortality prediction tasks and the ability to preserve ordinality properties: i.e. a lab test with normal outcome should be more similar to an abnormal one than to the a very abnormal one.
Tasks	Mortality Prediction
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09600v2
PDF	https://arxiv.org/pdf/1907.09600v2.pdf
PWC	https://paperswithcode.com/paper/evaluation-of-embeddings-of-laboratory-test
Repo	https://github.com/elleros/DSHealth2019_loinc_embeddings
Framework	none

Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds


Title	Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds
Authors	John P. Lalor, Hao Wu, Hong Yu
Abstract	Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.
Tasks	Natural Language Inference, Sentiment Analysis
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11421v1
PDF	https://arxiv.org/pdf/1908.11421v1.pdf
PWC	https://paperswithcode.com/paper/learning-latent-parameters-without-human
Repo	https://github.com/jplalor/py-irt
Framework	pytorch

Unsupervised Domain Adaptation for Cross-sensor Pore Detection in High-resolution Fingerprint Images


Title	Unsupervised Domain Adaptation for Cross-sensor Pore Detection in High-resolution Fingerprint Images
Authors	Vijay Anand, Vivek Kanhangad
Abstract	With the emergence of high-resolution fingerprint sensors, there has been a lot of focus on level-3 fingerprint features, especially the pores, for the next generation automated fingerprint recognition systems (AFRS). Following the success of deep learning in various computer vision tasks, researchers have developed learning-based approaches for detection of pores in high-resolution fingerprint images. Generally, learning-based approaches provide better performance than handcrafted feature-based approaches. However, domain adaptability of the existing learning-based pore detection methods has never been studied. In this paper, we study this aspect and propose an approach for pore detection in cross-sensor scenarios. For this purpose, we have generated an in-house 1000 dpi fingerprint dataset with ground truth pore coordinates (referred to as IITI-HRFP-GT), and evaluated the performance of the existing learning-based pore detection approaches. The core of the proposed approach for detection of pores in cross-sensor scenarios is DeepDomainPore, which is a residual learning-based convolutional neural network(CNN) trained for pore detection. The domain adaptability in DeepDomainPore is achieved by embedding a gradient reversal layer between the CNN and a domain classifier network. The proposed approach achieves state-of-the-art performance in a cross-sensor scenario involving public high-resolution fingerprint datasets with 88.12% true detection rate and 83.82% F-score.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10701v2
PDF	https://arxiv.org/pdf/1908.10701v2.pdf
PWC	https://paperswithcode.com/paper/cross-sensor-pore-detection-in-high
Repo	https://github.com/anubhav4sachan/domain-adapt
Framework	pytorch

SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder


Title	SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder
Authors	Hyojin Park, Lars Lowe Sjösund, YoungJoon Yoo, Nicolas Monet, Jihwan Bang, Nojun Kwak
Abstract	Designing a lightweight and robust portrait segmentation algorithm is an important task for a wide range of face applications. However, the problem has been considered as a subset of the object segmentation problem and less handled in the semantic segmentation field. Obviously, portrait segmentation has its unique requirements. First, because the portrait segmentation is performed in the middle of a whole process of many real-world applications, it requires extremely lightweight models. Second, there has not been any public datasets in this domain that contain a sufficient number of images with unbiased statistics. To solve the first problem, we introduce the new extremely lightweight portrait segmentation model SINet, containing an information blocking decoder and spatial squeeze modules. The information blocking decoder uses confidence estimates to recover local spatial information without spoiling global consistency. The spatial squeeze module uses multiple receptive fields to cope with various sizes of consistency in the image. To tackle the second problem, we propose a simple method to create additional portrait segmentation data which can improve accuracy on the EG1800 dataset. In our qualitative and quantitative analysis on the EG1800 dataset, we show that our method outperforms various existing lightweight segmentation models. Our method reduces the number of parameters from 2.1M to 86.9K (around 95.9% reduction), while maintaining the accuracy under an 1% margin from the state-of-the-art portrait segmentation method. We also show our model is successfully executed on a real mobile device with 100.6 FPS. In addition, we demonstrate that our method can be used for general semantic segmentation on the Cityscapes dataset. The code and dataset are available in https://github.com/HYOJINPARK/ExtPortraitSeg .
Tasks	Semantic Segmentation
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09099v4
PDF	https://arxiv.org/pdf/1911.09099v4.pdf
PWC	https://paperswithcode.com/paper/sinet-extreme-lightweight-portrait
Repo	https://github.com/HYOJINPARK/ExtPortraitSeg
Framework	pytorch

Self-training with progressive augmentation for unsupervised cross-domain person re-identification


Title	Self-training with progressive augmentation for unsupervised cross-domain person re-identification
Authors	Xinyu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You
Abstract	Person re-identification (Re-ID) has achieved great improvement with deep learning and a large amount of labelled training data. However, it remains a challenging task for adapting a model trained in a source domain of labelled data to a target domain of only unlabelled data available. In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset. Specially, our PAST framework consists of two stages, namely, conservative stage and promoting stage. The conservative stage captures the local structure of target-domain data points with triplet-based loss functions, leading to improved feature representations. The promoting stage continuously optimizes the network by appending a changeable classification layer to the last layer of the model, enabling the use of global information about the data distribution. Importantly, we propose a new self-training strategy that progressively augments the model capability by adopting conservative and promoting stages alternately. Furthermore, to improve the reliability of selected triplet samples, we introduce a ranking-based triplet loss in the conservative stage, which is a label-free objective function basing on the similarities between data pairs. Experiments demonstrate that the proposed method achieves state-of-the-art person Re-ID performance under the unsupervised cross-domain setting. Code is available at: https://tinyurl.com/PASTReID
Tasks	Person Re-Identification
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13315v1
PDF	https://arxiv.org/pdf/1907.13315v1.pdf
PWC	https://paperswithcode.com/paper/self-training-with-progressive-augmentation
Repo	https://github.com/zhangxinyu-xyz/PAST-ReID
Framework	pytorch

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees


Title	Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees
Authors	Summer Devlin, Chandan Singh, W. James Murdoch, Bin Yu
Abstract	Tree ensembles, such as random forests and AdaBoost, are ubiquitous machine learning models known for achieving strong predictive performance across a wide variety of domains. However, this strong performance comes at the cost of interpretability (i.e. users are unable to understand the relationships a trained random forest has learned and why it is making its predictions). In particular, it is challenging to understand how the contribution of a particular feature, or group of features, varies as their value changes. To address this, we introduce Disentangled Attribution Curves (DAC), a method to provide interpretations of tree ensemble methods in the form of (multivariate) feature importance curves. For a given variable, or group of variables, DAC plots the importance of a variable(s) as their value changes. We validate DAC on real data by showing that the curves can be used to increase the accuracy of logistic regression while maintaining interpretability, by including DAC as an additional feature. In simulation studies, DAC is shown to out-perform competing methods in the recovery of conditional expectations. Finally, through a case-study on the bike-sharing dataset, we demonstrate the use of DAC to uncover novel insights into a dataset.
Tasks	Feature Engineering, Feature Importance, Interpretable Machine Learning
Published	2019-05-18
URL	https://arxiv.org/abs/1905.07631v1
PDF	https://arxiv.org/pdf/1905.07631v1.pdf
PWC	https://paperswithcode.com/paper/disentangled-attribution-curves-for
Repo	https://github.com/csinva/disentangled-attribution-curves
Framework	none

Content-Aware Unsupervised Deep Homography Estimation


Title	Content-Aware Unsupervised Deep Homography Estimation
Authors	Jirong Zhang, Chuan Wang, Shuaicheng Liu, Lanpeng Jia, Jue Wang, Ji Zhou
Abstract	Robust homography estimation between two images is a fundamental task which has been widely applied to various vision applications. Traditional feature based methods often detect image features and fit a homography according to matched features with RANSAC outlier removal. However, the quality of homography heavily relies on the quality of image features, which are prone to errors with respect to low light and low texture images. On the other hand, previous deep homography approaches either synthesize images for supervised learning or adopt aerial images for unsupervised learning, both ignoring the importance of depth disparities in homography estimation. Moreover, they treat the image content equally, including regions of dynamic objects and near-range foregrounds, which further decreases the quality of estimation. In this work, to overcome such problems, we propose an unsupervised deep homography method with a new architecture design. We learn a mask during the estimation to reject outlier regions. In addition, we calculate loss with respect to our learned deep features instead of directly comparing the image contents as did previously. Moreover, a comprehensive dataset is presented, covering both regular and challenging cases, such as poor textures and non-planar interferences. The effectiveness of our method is validated through comparisons with both feature-based and previous deep-based methods. Code will be soon available at Github.
Tasks	Homography Estimation
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05983v1
PDF	https://arxiv.org/pdf/1909.05983v1.pdf
PWC	https://paperswithcode.com/paper/content-aware-unsupervised-deep-homography
Repo	https://github.com/JirongZhang/DeepHomography
Framework	none

Homography from two orientation- and scale-covariant features


Title	Homography from two orientation- and scale-covariant features
Authors	Daniel Barath, Zuzana Kukelova
Abstract	This paper proposes a geometric interpretation of the angles and scales which the orientation- and scale-covariant feature detectors, e.g. SIFT, provide. Two new general constraints are derived on the scales and rotations which can be used in any geometric model estimation tasks. Using these formulas, two new constraints on homography estimation are introduced. Exploiting the derived equations, a solver for estimating the homography from the minimal number of two correspondences is proposed. Also, it is shown how the normalization of the point correspondences affects the rotation and scale parameters, thus achieving numerically stable results. Due to requiring merely two feature pairs, robust estimators, e.g. RANSAC, do significantly fewer iterations than by using the four-point algorithm. When using covariant features, e.g. SIFT, the information about the scale and orientation is given at no cost. The proposed homography estimation method is tested in a synthetic environment and on publicly available real-world datasets.
Tasks	Homography Estimation
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11927v1
PDF	https://arxiv.org/pdf/1906.11927v1.pdf
PWC	https://paperswithcode.com/paper/homography-from-two-orientation-and-scale
Repo	https://github.com/danini/homography-from-sift-features
Framework	none