April 3, 2020

3354 words 16 mins read

Paper Group AWR 11

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data. Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks. Common-Knowledge Concept Recognition for SEVA. AvatarMe: Realistically Renderable 3D Facial Reconstruction “in-the-wild”. Sketch Less for More: On-the-Fly Fine-Grained Sketch Based …

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data


Title	DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data
Authors	Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin
Abstract	We present a method for depth estimation with monocular images, which can predict high-quality depth on diverse scenes up to an affine transformation, thus preserving accurate shapes of a scene. Previous methods that predict metric depth often work well only for a specific scene. In contrast, learning relative depth (information of being closer or further) can enjoy better generalization, with the price of failing to recover the accurate geometric shape of the scene. In this work, we propose a dataset and methods to tackle this dilemma, aiming to predict accurate depth up to an affine transformation with good generalization to diverse scenes. First we construct a large-scale and diverse dataset, termed Diverse Scene Depth dataset (DiverseDepth), which has a broad range of scenes and foreground contents. Compared with previous learning objectives, i.e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes. Furthermore, in order to train the model on the complex dataset effectively, we propose a multi-curriculum learning method. Experiments show that our method outperforms previous methods on 8 datasets by a large margin with the zero-shot test setting, demonstrating the excellent generalization capacity of the learned model to diverse scenes. The reconstructed point clouds with the predicted depth show that our method can recover high-quality 3D shapes. Code and dataset are available at: https://tinyurl.com/DiverseDepth
Tasks	Depth Estimation
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00569v3
PDF	https://arxiv.org/pdf/2002.00569v3.pdf
PWC	https://paperswithcode.com/paper/diversedepth-affine-invariant-depth
Repo	https://github.com/YvanYin/DiverseDepth
Framework	pytorch

Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks


Title	Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks
Authors	Sravan Mylavarapu, Mahtab Sandhu, Priyesh Vijayan, K Madhava Krishna, Balaraman Ravindran, Anoop Namboodiri
Abstract	Understanding on-road vehicle behaviour from a temporal sequence of sensor data is gaining in popularity. In this paper, we propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video. A monocular sequence along with scene semantics, optical flow and object labels are used to get spatial information about the object (vehicle) of interest and other objects (semantically contiguous set of locations) in the scene. This spatial information is encoded by a Multi-Relational Graph Convolutional Network (MR-GCN), and a temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours. The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse and include European, Chinese and Indian on-road scenes. The framework also provides for seamless transfer of models across datasets without entailing re-annotation, retraining and even fine-tuning. We show comparative performance gain over baseline Spatio-temporal classifiers and detail a variety of ablations to showcase the efficacy of the framework.
Tasks	Optical Flow Estimation
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00786v2
PDF	https://arxiv.org/pdf/2002.00786v2.pdf
PWC	https://paperswithcode.com/paper/towards-accurate-vehicle-behaviour
Repo	https://github.com/ma8sa/temporal-MR-GCN
Framework	pytorch

Common-Knowledge Concept Recognition for SEVA


Title	Common-Knowledge Concept Recognition for SEVA
Authors	Jitin Krishnan, Patrick Coronado, Hemant Purohit, Huzefa Rangwala
Abstract	We build a common-knowledge concept recognition system for a Systems Engineer’s Virtual Assistant (SEVA) which can be used for downstream tasks such as relation extraction, knowledge graph construction, and question-answering. The problem is formulated as a token classification task similar to named entity extraction. With the help of a domain expert and text processing methods, we construct a dataset annotated at the word-level by carefully defining a labelling scheme to train a sequence model to recognize systems engineering concepts. We use a pre-trained language model and fine-tune it with the labeled dataset of concepts. In addition, we also create some essential datasets for information such as abbreviations and definitions from the systems engineering domain. Finally, we construct a simple knowledge graph using these extracted concepts along with some hyponym relations.
Tasks	Entity Extraction, graph construction, Language Modelling, Question Answering, Relation Extraction
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11687v1
PDF	https://arxiv.org/pdf/2003.11687v1.pdf
PWC	https://paperswithcode.com/paper/common-knowledge-concept-recognition-for-seva
Repo	https://github.com/jitinkrishnan/NASA-SE
Framework	none

AvatarMe: Realistically Renderable 3D Facial Reconstruction “in-the-wild”


Title	AvatarMe: Realistically Renderable 3D Facial Reconstruction “in-the-wild”
Authors	Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, Stefanos Zafeiriou
Abstract	Over the last years, with the advent of Generative Adversarial Networks (GANs), many face analysis tasks have accomplished astounding performance, with applications including, but not limited to, face generation and 3D face reconstruction from a single “in-the-wild” image. Nevertheless, to the best of our knowledge, there is no method which can produce high-resolution photorealistic 3D faces from “in-the-wild” images and this can be attributed to the: (a) scarcity of available data for training, and (b) lack of robust methodologies that can successfully be applied on very high-resolution data. In this paper, we introduce AvatarMe, the first method that is able to reconstruct photorealistic 3D faces from a single “in-the-wild” image with an increasing level of detail. To achieve this, we capture a large dataset of facial shape and reflectance and build on a state-of-the-art 3D texture and shape reconstruction method and successively refine its results, while generating the per-pixel diffuse and specular components that are required for realistic rendering. As we demonstrate in a series of qualitative and quantitative experiments, AvatarMe outperforms the existing arts by a significant margin and reconstructs authentic, 4K by 6K-resolution 3D faces from a single low-resolution image that, for the first time, bridges the uncanny valley.
Tasks	3D Face Reconstruction, Face Generation, Face Reconstruction
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13845v1
PDF	https://arxiv.org/pdf/2003.13845v1.pdf
PWC	https://paperswithcode.com/paper/avatarme-realistically-renderable-3d-facial
Repo	https://github.com/lattas/avatarme
Framework	none

Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval


Title	Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval
Authors	Ayan Kumar Bhunia, Yongxin Yang, Timothy M. Hospedales, Tao Xiang, Yi-Zhe Song
Abstract	Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user’s query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning-based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achieve superior early-retrieval efficiency over state-of-the-art methods and alternative baselines on two publicly available fine-grained sketch retrieval datasets.
Tasks	Cross-Modal Retrieval, Image Retrieval, Sketch-Based Image Retrieval
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10310v2
PDF	https://arxiv.org/pdf/2002.10310v2.pdf
PWC	https://paperswithcode.com/paper/sketch-less-for-more-on-the-fly-fine-grained
Repo	https://github.com/AyanKumarBhunia/on-the-fly-FGSBIR
Framework	pytorch

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus


Title	CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Authors	Changhan Wang, Juan Pino, Anne Wu, Jiatao Gu
Abstract	Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C. Existing datasets involve language pairs with English as a source language, involve very specific domains or are low resource. We introduce CoVoST, a multilingual speech-to-text translation corpus from 11 languages into English, diversified with over 11,000 speakers and over 60 accents. We describe the dataset creation methodology and provide empirical evidence of the quality of the data. We also provide initial benchmarks, including, to our knowledge, the first end-to-end many-to-one multilingual models for spoken language translation. CoVoST is released under CC0 license and free to use. We also provide additional evaluation data derived from Tatoeba under CC licenses.
Tasks
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01320v1
PDF	https://arxiv.org/pdf/2002.01320v1.pdf
PWC	https://paperswithcode.com/paper/covost-a-diverse-multilingual-speech-to-text
Repo	https://github.com/facebookresearch/covost
Framework	none

Scalable Variational Gaussian Process Regression Networks


Title	Scalable Variational Gaussian Process Regression Networks
Authors	Shibo Li, Wei Xing, Mike Kirby, Shandian Zhe
Abstract	Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.
Tasks
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11489v1
PDF	https://arxiv.org/pdf/2003.11489v1.pdf
PWC	https://paperswithcode.com/paper/scalable-variational-gaussian-process-1
Repo	https://github.com/trungngv/gprn
Framework	none

Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation


Title	Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation
Authors	Dongliang Chang, Aneeshan Sain, Zhanyu Ma, Yi-Zhe Song, Jun Guo
Abstract	Unsupervised domain adaptation aims to leverage labeled data from a source domain to learn a classifier for an unlabeled target domain. Among its many variants, open set domain adaptation (OSDA) is perhaps the most challenging, as it further assumes the presence of unknown classes in the target domain. In this paper, we study OSDA with a particular focus on enriching its ability to traverse across larger domain gaps. Firstly, we show that existing state-of-the-art methods suffer a considerable performance drop in the presence of larger domain gaps, especially on a new dataset (PACS) that we re-purposed for OSDA. We then propose a novel framework to specifically address the larger domain gaps. The key insight lies with how we exploit the mutually beneficial information between two networks; (a) to separate samples of known and unknown classes, (b) to maximize the domain confusion between source and target domain without the influence of unknown samples. It follows that (a) and (b) will mutually supervise each other and alternate until convergence. Extensive experiments are conducted on Office-31, Office-Home, and PACS datasets, demonstrating the superiority of our method in comparison to other state-of-the-arts. Code available at https://github.com/dongliangchang/Mutual-to-Separate/
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-03-08
URL	https://arxiv.org/abs/2003.03787v2
PDF	https://arxiv.org/pdf/2003.03787v2.pdf
PWC	https://paperswithcode.com/paper/mind-the-gap-enlarging-the-domain-gap-in-open
Repo	https://github.com/dongliangchang/Mutual-to-Separate
Framework	pytorch

Conditional Convolutions for Instance Segmentation


Title	Conditional Convolutions for Instance Segmentation
Authors	Zhi Tian, Chunhua Shen, Hao Chen
Abstract	We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation). Top-performing instance segmentation methods such as Mask R-CNN rely on ROI operations (typically ROIPool or ROIAlign) to obtain the final instance masks. In contrast, we propose to solve instance segmentation from a new perspective. Instead of using instance-wise ROIs as inputs to a network of fixed weights, we employ dynamic instance-aware networks, conditioned on instances. CondInst enjoys two advantages: 1) Instance segmentation is solved by a fully convolutional network, eliminating the need for ROI cropping and feature alignment. 2) Due to the much improved capacity of dynamically-generated conditional convolutions, the mask head can be very compact (e.g., 3 conv. layers, each having only 8 channels), leading to significantly faster inference. We demonstrate a simpler instance segmentation method that can achieve improved performance in both accuracy and inference speed. On the COCO dataset, we outperform a few recent methods including well-tuned Mask RCNN baselines, without longer training schedules needed. Code is available: https://github.com/aim-uofa/adet
Tasks	Instance Segmentation, Semantic Segmentation
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05664v3
PDF	https://arxiv.org/pdf/2003.05664v3.pdf
PWC	https://paperswithcode.com/paper/conditional-convolutions-for-instance
Repo	https://github.com/aim-uofa/AdelaiDet
Framework	pytorch

What is the State of Neural Network Pruning?


Title	What is the State of Neural Network Pruning?
Authors	Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, John Guttag
Abstract	Neural network pruning—the task of reducing the size of a network by removing parameters—has been the subject of a great deal of work in recent years. We provide a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, our clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, we identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. We use ShrinkBench to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods.
Tasks	Network Pruning
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03033v1
PDF	https://arxiv.org/pdf/2003.03033v1.pdf
PWC	https://paperswithcode.com/paper/what-is-the-state-of-neural-network-pruning
Repo	https://github.com/jjgo/shrinkbench
Framework	pytorch

HRank: Filter Pruning using High-Rank Feature Map


Title	HRank: Filter Pruning using High-Rank Feature Map
Authors	Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, Ling Shao
Abstract	Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of non-salient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with low-rank feature maps. The principle behind our pruning is that low-rank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with high-rank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the state-of-the-arts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet-110, we achieve a 58.2%-FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top-1 accuracy on CIFAR-10. With Res-50, we achieve a 43.8%-FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top-1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.
Tasks	Network Pruning
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10179v2
PDF	https://arxiv.org/pdf/2002.10179v2.pdf
PWC	https://paperswithcode.com/paper/hrank-filter-pruning-using-high-rank-feature
Repo	https://github.com/lmbxmu/HRank
Framework	pytorch

Filter Sketch for Network Pruning


Title	Filter Sketch for Network Pruning
Authors	Mingbao Lin, Rongrong Ji, Shaojie Li, Qixiang Ye, Yonghong Tian, Jianzhuang Liu, Qi Tian
Abstract	In this paper, we propose a novel network pruning approach by information preserving of pre-trained network weights (filters). Our approach, referred to as FilterSketch, encodes the second-order information of pre-trained weights, through which the model performance is recovered by fine-tuning the pruned network in an end-to-end manner. Network pruning with information preserving can be approximated as a matrix sketch problem, which is efficiently solved by the off-the-shelf Frequent Direction method. FilterSketch thereby requires neither training from scratch nor data-driven iterative optimization, leading to a magnitude-order reduction of time consumption in the optimization of pruning. Experiments on CIFAR-10 show that FilterSketch reduces 63.3% of FLOPs and prunes 59.9% of network parameters with negligible accuracy cost overhead for ResNet-110. On ILSVRC-2012, it achieves a reduction of 45.5% FLOPs and removes 43.0% of parameters with only a small top-5 accuracy drop of 0.69% for ResNet-50. Source codes of the proposed FilterSketch can be available at https://github.com/lmbxmu/FilterSketch.
Tasks	Network Pruning
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08514v2
PDF	https://arxiv.org/pdf/2001.08514v2.pdf
PWC	https://paperswithcode.com/paper/filter-sketch-for-network-pruning
Repo	https://github.com/lmbxmu/FilterSketch
Framework	pytorch

Domain-independent Extraction of Scientific Concepts from Research Articles


Title	Domain-independent Extraction of Scientific Concepts from Research Articles
Authors	Arthur Brack, Jennifer D’Souza, Anett Hoppe, Sören Auer, Ralph Ewerth
Abstract	We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present two deep learning systems as baselines. In particular, we propose active learning to deal with different domains in our task. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.
Tasks	Active Learning, Named Entity Recognition, Scientific Concept Extraction
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03067v1
PDF	https://arxiv.org/pdf/2001.03067v1.pdf
PWC	https://paperswithcode.com/paper/domain-independent-extraction-of-scientific
Repo	https://github.com/arthurbra/stm-corpus/blob/master/README.md
Framework	none

Deep Reinforcement Learning for Active Human Pose Estimation


Title	Deep Reinforcement Learning for Active Human Pose Estimation
Authors	Erik Gärtner, Aleksis Pirinen, Cristian Sminchisescu
Abstract	Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in `time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines. \|
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2020-01-07
URL	https://arxiv.org/abs/2001.02024v1
PDF	https://arxiv.org/pdf/2001.02024v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-active-human
Repo	https://github.com/aleksispi/pose-drl
Framework	none

Complementary Network with Adaptive Receptive Fields for Melanoma Segmentation


Title	Complementary Network with Adaptive Receptive Fields for Melanoma Segmentation
Authors	Xiaoqing Guo, Zhen Chen, Yixuan Yuan
Abstract	Automatic melanoma segmentation in dermoscopic images is essential in computer-aided diagnosis of skin cancer. Existing methods may suffer from the hole and shrink problems with limited segmentation performance. To tackle these issues, we propose a novel complementary network with adaptive receptive filed learning. Instead of regarding the segmentation task independently, we introduce a foreground network to detect melanoma lesions and a background network to mask non-melanoma regions. Moreover, we propose adaptive atrous convolution (AAC) and knowledge aggregation module (KAM) to fill holes and alleviate the shrink problems. AAC explicitly controls the receptive field at multiple scales and KAM convolves shallow feature maps by dilated convolutions with adaptive receptive fields, which are adjusted according to deep feature maps. In addition, a novel mutual loss is proposed to utilize the dependency between the foreground and background networks, thereby enabling the reciprocally influence within these two networks. Consequently, this mutual training strategy enables the semi-supervised learning and improve the boundary-sensitivity. Training with Skin Imaging Collaboration (ISIC) 2018 skin lesion segmentation dataset, our method achieves a dice co-efficient of 86.4% and shows better performance compared with state-of-the-art melanoma segmentation methods.
Tasks	Lesion Segmentation
Published	2020-01-12
URL	https://arxiv.org/abs/2001.03893v1
PDF	https://arxiv.org/pdf/2001.03893v1.pdf
PWC	https://paperswithcode.com/paper/complementary-network-with-adaptive-receptive
Repo	https://github.com/Guo-Xiaoqing/Skin-Seg
Framework	tf