April 2, 2020

2986 words 15 mins read

Paper Group ANR 170

RoutedFusion: Learning Real-time Depth Map Fusion. Automated Anonymisation of Visual and Audio Data in Classroom Studies. VideoSSL: Semi-Supervised Learning for Video Classification. Peeking into occluded joints: A novel framework for crowd pose estimation. Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals. Towa …

RoutedFusion: Learning Real-time Depth Map Fusion


Title	RoutedFusion: Learning Real-time Depth Map Fusion
Authors	Silvan Weder, Johannes Schönberger, Marc Pollefeys, Martin R. Oswald
Abstract	The efficient fusion of depth maps is a key part of most state-of-the-art 3D reconstruction methods. Besides requiring high accuracy, these depth fusion methods need to be scalable and real-time capable. To this end, we present a novel real-time capable machine learning-based method for depth map fusion. Similar to the seminal depth map fusion approach by Curless and Levoy, we only update a local group of voxels to ensure real-time capability. Instead of a simple linear fusion of depth information, we propose a neural network that predicts non-linear updates to better account for typical fusion errors. Our network is composed of a 2D depth routing network and a 3D depth fusion network which efficiently handle sensor-specific noise and outliers. This is especially useful for surface edges and thin objects for which the original approach suffers from thickening artifacts. Our method outperforms the traditional fusion approach and related learned approaches on both synthetic and real data. We demonstrate the performance of our method in reconstructing fine geometric details from noise and outlier contaminated data on various scenes
Tasks	3D Reconstruction
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04388v1
PDF	https://arxiv.org/pdf/2001.04388v1.pdf
PWC	https://paperswithcode.com/paper/routedfusion-learning-real-time-depth-map
Repo
Framework

Automated Anonymisation of Visual and Audio Data in Classroom Studies


Title	Automated Anonymisation of Visual and Audio Data in Classroom Studies
Authors	Ömer Sümer, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci
Abstract	Understanding students’ and teachers’ verbal and non-verbal behaviours during instruction may help infer valuable information regarding the quality of teaching. In education research, there have been many studies that aim to measure students’ attentional focus on learning-related tasks: Based on audio-visual recordings and manual or automated ratings of behaviours of teachers and students. Student data is, however, highly sensitive. Therefore, ensuring high standards of data protection and privacy has the utmost importance in current practices. For example, in the context of teaching management studies, data collection is carried out with the consent of pupils, parents, teachers and school administrations. Nevertheless, there may often be students whose data cannot be used for research purposes. Excluding these students from the classroom is an unnatural intrusion into the organisation of the classroom. A possible solution would be to request permission to record the audio-visual recordings of all students (including those who do not voluntarily participate in the study) and to anonymise their data. Yet, the manual anonymisation of audio-visual data is very demanding. In this study, we examine the use of artificial intelligence methods to automatically anonymise the visual and audio data of a particular person.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.05080v1
PDF	https://arxiv.org/pdf/2001.05080v1.pdf
PWC	https://paperswithcode.com/paper/automated-anonymisation-of-visual-and-audio
Repo
Framework

VideoSSL: Semi-Supervised Learning for Video Classification


Title	VideoSSL: Semi-Supervised Learning for Video Classification
Authors	Longlong Jing, Toufiq Parag, Zhe Wu, Yingli Tian, Hongcheng Wang
Abstract	We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN). Like other computer vision tasks, existing supervised video classification methods demand a large amount of labeled data to attain good performance. However, annotation of a large dataset is expensive and time consuming. To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data. The first signal is the pseudo-labels of unlabeled examples computed from the confidences of the CNN being trained. The other is the normalized probabilities, as predicted by an image classifier CNN, that captures the information about appearances of the interesting objects in the video. We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances utilizing a small fraction of annotated examples on three publicly available datasets: UCF101, HMDB51 and Kinetics.
Tasks	Video Classification
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00197v1
PDF	https://arxiv.org/pdf/2003.00197v1.pdf
PWC	https://paperswithcode.com/paper/videossl-semi-supervised-learning-for-video
Repo
Framework

Peeking into occluded joints: A novel framework for crowd pose estimation


Title	Peeking into occluded joints: A novel framework for crowd pose estimation
Authors	Lingteng Qiu, Xuanye Zhang, Yanran Li, Guanbin Li, Xiaojun Wu, Zixiang Xiong, Xiaoguang Han, Shuguang Cui
Abstract	Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions. Their intrinsic problem is that they directly localize the joints based on visual information; however, the invisible joints are lack of that. In contrast to localization, our framework estimates the invisible joints from an inference perspective by proposing an Image-Guided Progressive GCN module which provides a comprehensive understanding of both image context and pose structure. Moreover, existing benchmarks contain limited occlusions for evaluation. Therefore, we thoroughly pursue this problem and propose a novel OPEC-Net framework together with a new Occluded Pose (OCPose) dataset with 9k annotated images. Extensive quantitative and qualitative evaluations on benchmarks demonstrate that OPEC-Net achieves significant improvements over recent leading works. Notably, our OCPose is the most complex occlusion dataset with respect to average IoU between adjacent instances. Source code and OCPose will be publicly available.
Tasks	Pose Estimation
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10506v3
PDF	https://arxiv.org/pdf/2003.10506v3.pdf
PWC	https://paperswithcode.com/paper/peeking-into-occluded-joints-a-novel
Repo
Framework

Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals


Title	Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals
Authors	Siyuan Li, Semih Günel, Mirela Ostrek, Pavan Ramdya, Pascal Fua, Helge Rhodin
Abstract	Our goal is to capture the pose of neuroscience model organisms, without using any manual supervision, to be able to study how neural circuits orchestrate behaviour. Human pose estimation attains remarkable accuracy when trained on real or simulated datasets consisting of millions of frames. However, for many applications simulated models are unrealistic and real training datasets with comprehensive annotations do not exist. We address this problem with a new sim2real domain transfer method. Our key contribution is the explicit and independent modeling of appearance, shape and poses in an unpaired image translation framework. Our model lets us train a pose estimator on the target domain by transferring readily available body keypoint locations from the source domain to generated target images. We compare our approach with existing domain transfer methods and demonstrate improved pose estimation accuracy on Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Danio rerio (zebrafish), without requiring any manual annotation on the target domain and despite using simplistic off-the-shelf animal characters for simulation, or simple geometric shapes as models. Our new datasets, code, and trained models will be published to support future neuroscientific studies.
Tasks	Pose Estimation
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08601v1
PDF	https://arxiv.org/pdf/2001.08601v1.pdf
PWC	https://paperswithcode.com/paper/deformation-aware-unpaired-image-translation
Repo
Framework

Towards Cognitive Routing based on Deep Reinforcement Learning


Title	Towards Cognitive Routing based on Deep Reinforcement Learning
Authors	Jiawei Wu, Jianxue Li, Yang Xiao, Jun Liu
Abstract	Routing is one of the key functions for stable operation of network infrastructure. Nowadays, the rapid growth of network traffic volume and changing of service requirements call for more intelligent routing methods than before. Towards this end, we propose a definition of cognitive routing and an implementation approach based on Deep Reinforcement Learning (DRL). To facilitate the research of DRL-based cognitive routing, we introduce a simulator named RL4Net for DRL-based routing algorithm development and simulation. Then, we design and implement a DDPG-based routing algorithm. The simulation results on an example network topology show that the DDPG-based routing algorithm achieves better performance than OSPF and random weight algorithms. It demonstrate the preliminary feasibility and potential advantage of cognitive routing for future network.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.12439v1
PDF	https://arxiv.org/pdf/2003.12439v1.pdf
PWC	https://paperswithcode.com/paper/towards-cognitive-routing-based-on-deep
Repo
Framework

Black Box Explanation by Learning Image Exemplars in the Latent Feature Space


Title	Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
Authors	Riccardo Guidotti, Anna Monreale, Stan Matwin, Dino Pedreschi
Abstract	We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.
Tasks	Image Classification
Published	2020-01-27
URL	https://arxiv.org/abs/2002.03746v1
PDF	https://arxiv.org/pdf/2002.03746v1.pdf
PWC	https://paperswithcode.com/paper/black-box-explanation-by-learning-image
Repo
Framework

VQ-DRAW: A Sequential Discrete VAE


Title	VQ-DRAW: A Sequential Discrete VAE
Authors	Alex Nichol
Abstract	In this paper, I present VQ-DRAW, an algorithm for learning compact discrete representations of data. VQ-DRAW leverages a vector quantization effect to adapt the sequential generation scheme of DRAW to discrete latent variables. I show that VQ-DRAW can effectively learn to compress images from a variety of common datasets, as well as generate realistic samples from these datasets with no help from an autoregressive prior.
Tasks	Quantization
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01599v1
PDF	https://arxiv.org/pdf/2003.01599v1.pdf
PWC	https://paperswithcode.com/paper/vq-draw-a-sequential-discrete-vae
Repo
Framework

Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization


Title	Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization
Authors	Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Tarek Elgindi, Charles-Alban Deledalle, Hadi Esmaeilzadeh
Abstract	As deep neural networks make their ways into different domains, their compute efficiency is becoming a first-order constraint. Deep quantization, which reduces the bitwidth of the operations (below 8 bits), offers a unique opportunity as it can reduce both the storage and compute requirements of the network super-linearly. However, if not employed with diligence, this can lead to significant accuracy loss. Due to the strong inter-dependence between layers and exhibiting different characteristics across the same network, choosing an optimal bitwidth per layer granularity is not a straight forward. As such, deep quantization opens a large hyper-parameter space, the exploration of which is a major challenge. We propose a novel sinusoidal regularization, called SINAREQ, for deep quantized training. Leveraging the sinusoidal properties, we seek to learn multiple quantization parameterization in conjunction during gradient-based training process. Specifically, we learn (i) a per-layer quantization bitwidth along with (ii) a scale factor through learning the period of the sinusoidal function. At the same time, we exploit the periodicity, differentiability, and the local convexity profile in sinusoidal functions to automatically propel (iii) network weights towards values quantized at levels that are jointly determined. We show how SINAREQ balance compute efficiency and accuracy, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks (AlexNet, CIFAR-10, MobileNet, ResNet-18, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy. Furthermore, we carry out experimentation using fixed homogenous bitwidths with 3- to 5-bit assignment and show the versatility of SINAREQ in enhancing quantized training algorithms (DoReFa and WRPN) with about 4.8% accuracy improvements on average, and then outperforming multiple state-of-the-art techniques.
Tasks	Quantization
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00146v1
PDF	https://arxiv.org/pdf/2003.00146v1.pdf
PWC	https://paperswithcode.com/paper/gradient-based-deep-quantization-of-neural
Repo
Framework

On Parameter Tuning in Meta-learning for Computer Vision


Title	On Parameter Tuning in Meta-learning for Computer Vision
Authors	Farid Ghareh Mohammadi, M. Hadi Amini, Hamid R. Arabnia
Abstract	Learning to learn plays a pivotal role in meta-learning (MTL) to obtain an optimal learning model. In this paper, we investigate mage recognition for unseen categories of a given dataset with limited training information. We deploy a zero-shot learning (ZSL) algorithm to achieve this goal. We also explore the effect of parameter tuning on performance of semantic auto-encoder (SAE). We further address the parameter tuning problem for meta-learning, especially focusing on zero-shot learning. By combining different embedded parameters, we improved the accuracy of tuned-SAE. Advantages and disadvantages of parameter tuning and its application in image classification are also explored.
Tasks	Image Classification, Meta-Learning, Zero-Shot Learning
Published	2020-02-11
URL	https://arxiv.org/abs/2003.00837v1
PDF	https://arxiv.org/pdf/2003.00837v1.pdf
PWC	https://paperswithcode.com/paper/on-parameter-tuning-in-meta-learning-for
Repo
Framework

Adversarial Attack on Deep Product Quantization Network for Image Retrieval


Title	Adversarial Attack on Deep Product Quantization Network for Image Retrieval
Authors	Yan Feng, Bin Chen, Tao Dai, Shutao Xia
Abstract	Deep product quantization network (DPQN) has recently received much attention in fast image retrieval tasks due to its efficiency of encoding high-dimensional visual features especially when dealing with large-scale datasets. Recent studies show that deep neural networks (DNNs) are vulnerable to input with small and maliciously designed perturbations (a.k.a., adversarial examples). This phenomenon raises the concern of security issues for DPQN in the testing/deploying stage as well. However, little effort has been devoted to investigating how adversarial examples affect DPQN. To this end, we propose product quantization adversarial generation (PQ-AG), a simple yet effective method to generate adversarial examples for product quantization based retrieval systems. PQ-AG aims to generate imperceptible adversarial perturbations for query images to form adversarial queries, whose nearest neighbors from a targeted product quantizaiton model are not semantically related to those from the original queries. Extensive experiments show that our PQ-AQ successfully creates adversarial examples to mislead targeted product quantization retrieval models. Besides, we found that our PQ-AG significantly degrades retrieval performance in both white-box and black-box settings.
Tasks	Adversarial Attack, Image Retrieval, Quantization
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11374v1
PDF	https://arxiv.org/pdf/2002.11374v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-attack-on-deep-product
Repo
Framework


Title	A Simulation Model Demonstrating the Impact of Social Aspects on Social Internet of Things
Authors	Kashif Zia
Abstract	In addition to seamless connectivity and smartness, the objects in the Internet of Things (IoT) are expected to have the social capabilities – these objects are termed as ``social objects’'. In this paper, an intuitive paradigm of social interactions between these objects are argued and modeled. The impact of social behavior on the interaction pattern of social objects is studied taking Peer-to-Peer (P2P) resource sharing as an example application. The model proposed in this paper studies the implications of competitive vs. cooperative social paradigm, while peers attempt to attain the shared resources / services. The simulation results divulge that the social capabilities of the peers impart a significant increase in the quality of interactions between social objects. Through an agent-based simulation study, it is proved that cooperative strategy is more efficient than competitive strategy. Moreover, cooperation with an underpinning on real-life networking structure and mobility does not negatively impact the efficiency of the system at all; rather it helps. \|
Tasks
Published	2020-02-23
URL	https://arxiv.org/abs/2002.11507v1
PDF	https://arxiv.org/pdf/2002.11507v1.pdf
PWC	https://paperswithcode.com/paper/a-simulation-model-demonstrating-the-impact
Repo
Framework

Task Augmentation by Rotating for Meta-Learning


Title	Task Augmentation by Rotating for Meta-Learning
Authors	Jialin Liu, Fei Chao, Chih-Min Lin
Abstract	Data augmentation is one of the most effective approaches for improving the accuracy of modern machine learning models, and it is also indispensable to train a deep model for meta-learning. In this paper, we introduce a task augmentation method by rotating, which increases the number of classes by rotating the original images 90, 180 and 270 degrees, different from traditional augmentation methods which increase the number of images. With a larger amount of classes, we can sample more diverse task instances during training. Therefore, task augmentation by rotating allows us to train a deep network by meta-learning methods with little over-fitting. Experimental results show that our approach is better than the rotation for increasing the number of images and achieves state-of-the-art performance on miniImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks. The code is available on \url{www.github.com/AceChuse/TaskLevelAug}.
Tasks	Data Augmentation, Few-Shot Learning, Meta-Learning
Published	2020-02-08
URL	https://arxiv.org/abs/2003.00804v1
PDF	https://arxiv.org/pdf/2003.00804v1.pdf
PWC	https://paperswithcode.com/paper/task-augmentation-by-rotating-for-meta
Repo
Framework

Neural Sign Language Translation by Learning Tokenization


Title	Neural Sign Language Translation by Learning Tokenization
Authors	Alptekin Orbay, Lale Akarun
Abstract	Sign Language Translation has attained considerable success recently, raising hopes for improved communication with the Deaf. A pre-processing step called tokenization improves the success of translations. Tokens can be learned from sign videos if supervised data is available. However, data annotation at the gloss level is costly, and annotated data is scarce. The paper utilizes Adversarial, Multitask, Transfer Learning to search for semi-supervised tokenization approaches without burden of additional labeling. It provides extensive experiments to compare all the methods in different settings to conduct a deeper analysis. In the case of no additional target annotation besides sentences, the proposed methodology attains 13.25 BLUE-4 and 36.28 ROUGE scores which improves the current state-of-the-art by 4 points in BLUE-4 and 5 points in ROUGE.
Tasks	Sign Language Translation, Tokenization, Transfer Learning
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00479v2
PDF	https://arxiv.org/pdf/2002.00479v2.pdf
PWC	https://paperswithcode.com/paper/neural-sign-language-translation-by-learning
Repo
Framework

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future


Title	Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future
Authors	Grace W. Lindsay
Abstract	Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNS in vision research beyond basic object recognition.
Tasks	Object Recognition
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07092v2
PDF	https://arxiv.org/pdf/2001.07092v2.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-as-a-model-of
Repo
Framework