Paper Group ANR 170
RoutedFusion: Learning Real-time Depth Map Fusion. Automated Anonymisation of Visual and Audio Data in Classroom Studies. VideoSSL: Semi-Supervised Learning for Video Classification. Peeking into occluded joints: A novel framework for crowd pose estimation. Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals. Towa …
RoutedFusion: Learning Real-time Depth Map Fusion
Title | RoutedFusion: Learning Real-time Depth Map Fusion |
Authors | Silvan Weder, Johannes Schönberger, Marc Pollefeys, Martin R. Oswald |
Abstract | The efficient fusion of depth maps is a key part of most state-of-the-art 3D reconstruction methods. Besides requiring high accuracy, these depth fusion methods need to be scalable and real-time capable. To this end, we present a novel real-time capable machine learning-based method for depth map fusion. Similar to the seminal depth map fusion approach by Curless and Levoy, we only update a local group of voxels to ensure real-time capability. Instead of a simple linear fusion of depth information, we propose a neural network that predicts non-linear updates to better account for typical fusion errors. Our network is composed of a 2D depth routing network and a 3D depth fusion network which efficiently handle sensor-specific noise and outliers. This is especially useful for surface edges and thin objects for which the original approach suffers from thickening artifacts. Our method outperforms the traditional fusion approach and related learned approaches on both synthetic and real data. We demonstrate the performance of our method in reconstructing fine geometric details from noise and outlier contaminated data on various scenes |
Tasks | 3D Reconstruction |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04388v1 |
https://arxiv.org/pdf/2001.04388v1.pdf | |
PWC | https://paperswithcode.com/paper/routedfusion-learning-real-time-depth-map |
Repo | |
Framework | |
Automated Anonymisation of Visual and Audio Data in Classroom Studies
Title | Automated Anonymisation of Visual and Audio Data in Classroom Studies |
Authors | Ömer Sümer, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci |
Abstract | Understanding students’ and teachers’ verbal and non-verbal behaviours during instruction may help infer valuable information regarding the quality of teaching. In education research, there have been many studies that aim to measure students’ attentional focus on learning-related tasks: Based on audio-visual recordings and manual or automated ratings of behaviours of teachers and students. Student data is, however, highly sensitive. Therefore, ensuring high standards of data protection and privacy has the utmost importance in current practices. For example, in the context of teaching management studies, data collection is carried out with the consent of pupils, parents, teachers and school administrations. Nevertheless, there may often be students whose data cannot be used for research purposes. Excluding these students from the classroom is an unnatural intrusion into the organisation of the classroom. A possible solution would be to request permission to record the audio-visual recordings of all students (including those who do not voluntarily participate in the study) and to anonymise their data. Yet, the manual anonymisation of audio-visual data is very demanding. In this study, we examine the use of artificial intelligence methods to automatically anonymise the visual and audio data of a particular person. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.05080v1 |
https://arxiv.org/pdf/2001.05080v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-anonymisation-of-visual-and-audio |
Repo | |
Framework | |
VideoSSL: Semi-Supervised Learning for Video Classification
Title | VideoSSL: Semi-Supervised Learning for Video Classification |
Authors | Longlong Jing, Toufiq Parag, Zhe Wu, Yingli Tian, Hongcheng Wang |
Abstract | We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN). Like other computer vision tasks, existing supervised video classification methods demand a large amount of labeled data to attain good performance. However, annotation of a large dataset is expensive and time consuming. To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data. The first signal is the pseudo-labels of unlabeled examples computed from the confidences of the CNN being trained. The other is the normalized probabilities, as predicted by an image classifier CNN, that captures the information about appearances of the interesting objects in the video. We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances utilizing a small fraction of annotated examples on three publicly available datasets: UCF101, HMDB51 and Kinetics. |
Tasks | Video Classification |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00197v1 |
https://arxiv.org/pdf/2003.00197v1.pdf | |
PWC | https://paperswithcode.com/paper/videossl-semi-supervised-learning-for-video |
Repo | |
Framework | |
Peeking into occluded joints: A novel framework for crowd pose estimation
Title | Peeking into occluded joints: A novel framework for crowd pose estimation |
Authors | Lingteng Qiu, Xuanye Zhang, Yanran Li, Guanbin Li, Xiaojun Wu, Zixiang Xiong, Xiaoguang Han, Shuguang Cui |
Abstract | Although occlusion widely exists in nature and remains a fundamental challenge for pose estimation, existing heatmap-based approaches suffer serious degradation on occlusions. Their intrinsic problem is that they directly localize the joints based on visual information; however, the invisible joints are lack of that. In contrast to localization, our framework estimates the invisible joints from an inference perspective by proposing an Image-Guided Progressive GCN module which provides a comprehensive understanding of both image context and pose structure. Moreover, existing benchmarks contain limited occlusions for evaluation. Therefore, we thoroughly pursue this problem and propose a novel OPEC-Net framework together with a new Occluded Pose (OCPose) dataset with 9k annotated images. Extensive quantitative and qualitative evaluations on benchmarks demonstrate that OPEC-Net achieves significant improvements over recent leading works. Notably, our OCPose is the most complex occlusion dataset with respect to average IoU between adjacent instances. Source code and OCPose will be publicly available. |
Tasks | Pose Estimation |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10506v3 |
https://arxiv.org/pdf/2003.10506v3.pdf | |
PWC | https://paperswithcode.com/paper/peeking-into-occluded-joints-a-novel |
Repo | |
Framework | |
Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals
Title | Deformation-aware Unpaired Image Translation for Pose Estimation on Laboratory Animals |
Authors | Siyuan Li, Semih Günel, Mirela Ostrek, Pavan Ramdya, Pascal Fua, Helge Rhodin |
Abstract | Our goal is to capture the pose of neuroscience model organisms, without using any manual supervision, to be able to study how neural circuits orchestrate behaviour. Human pose estimation attains remarkable accuracy when trained on real or simulated datasets consisting of millions of frames. However, for many applications simulated models are unrealistic and real training datasets with comprehensive annotations do not exist. We address this problem with a new sim2real domain transfer method. Our key contribution is the explicit and independent modeling of appearance, shape and poses in an unpaired image translation framework. Our model lets us train a pose estimator on the target domain by transferring readily available body keypoint locations from the source domain to generated target images. We compare our approach with existing domain transfer methods and demonstrate improved pose estimation accuracy on Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Danio rerio (zebrafish), without requiring any manual annotation on the target domain and despite using simplistic off-the-shelf animal characters for simulation, or simple geometric shapes as models. Our new datasets, code, and trained models will be published to support future neuroscientific studies. |
Tasks | Pose Estimation |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08601v1 |
https://arxiv.org/pdf/2001.08601v1.pdf | |
PWC | https://paperswithcode.com/paper/deformation-aware-unpaired-image-translation |
Repo | |
Framework | |
Towards Cognitive Routing based on Deep Reinforcement Learning
Title | Towards Cognitive Routing based on Deep Reinforcement Learning |
Authors | Jiawei Wu, Jianxue Li, Yang Xiao, Jun Liu |
Abstract | Routing is one of the key functions for stable operation of network infrastructure. Nowadays, the rapid growth of network traffic volume and changing of service requirements call for more intelligent routing methods than before. Towards this end, we propose a definition of cognitive routing and an implementation approach based on Deep Reinforcement Learning (DRL). To facilitate the research of DRL-based cognitive routing, we introduce a simulator named RL4Net for DRL-based routing algorithm development and simulation. Then, we design and implement a DDPG-based routing algorithm. The simulation results on an example network topology show that the DDPG-based routing algorithm achieves better performance than OSPF and random weight algorithms. It demonstrate the preliminary feasibility and potential advantage of cognitive routing for future network. |
Tasks | |
Published | 2020-03-19 |
URL | https://arxiv.org/abs/2003.12439v1 |
https://arxiv.org/pdf/2003.12439v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-cognitive-routing-based-on-deep |
Repo | |
Framework | |
Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
Title | Black Box Explanation by Learning Image Exemplars in the Latent Feature Space |
Authors | Riccardo Guidotti, Anna Monreale, Stan Matwin, Dino Pedreschi |
Abstract | We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by “morphing” into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability. |
Tasks | Image Classification |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2002.03746v1 |
https://arxiv.org/pdf/2002.03746v1.pdf | |
PWC | https://paperswithcode.com/paper/black-box-explanation-by-learning-image |
Repo | |
Framework | |
VQ-DRAW: A Sequential Discrete VAE
Title | VQ-DRAW: A Sequential Discrete VAE |
Authors | Alex Nichol |
Abstract | In this paper, I present VQ-DRAW, an algorithm for learning compact discrete representations of data. VQ-DRAW leverages a vector quantization effect to adapt the sequential generation scheme of DRAW to discrete latent variables. I show that VQ-DRAW can effectively learn to compress images from a variety of common datasets, as well as generate realistic samples from these datasets with no help from an autoregressive prior. |
Tasks | Quantization |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01599v1 |
https://arxiv.org/pdf/2003.01599v1.pdf | |
PWC | https://paperswithcode.com/paper/vq-draw-a-sequential-discrete-vae |
Repo | |
Framework | |
Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization
Title | Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization |
Authors | Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Tarek Elgindi, Charles-Alban Deledalle, Hadi Esmaeilzadeh |
Abstract | As deep neural networks make their ways into different domains, their compute efficiency is becoming a first-order constraint. Deep quantization, which reduces the bitwidth of the operations (below 8 bits), offers a unique opportunity as it can reduce both the storage and compute requirements of the network super-linearly. However, if not employed with diligence, this can lead to significant accuracy loss. Due to the strong inter-dependence between layers and exhibiting different characteristics across the same network, choosing an optimal bitwidth per layer granularity is not a straight forward. As such, deep quantization opens a large hyper-parameter space, the exploration of which is a major challenge. We propose a novel sinusoidal regularization, called SINAREQ, for deep quantized training. Leveraging the sinusoidal properties, we seek to learn multiple quantization parameterization in conjunction during gradient-based training process. Specifically, we learn (i) a per-layer quantization bitwidth along with (ii) a scale factor through learning the period of the sinusoidal function. At the same time, we exploit the periodicity, differentiability, and the local convexity profile in sinusoidal functions to automatically propel (iii) network weights towards values quantized at levels that are jointly determined. We show how SINAREQ balance compute efficiency and accuracy, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks (AlexNet, CIFAR-10, MobileNet, ResNet-18, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy. Furthermore, we carry out experimentation using fixed homogenous bitwidths with 3- to 5-bit assignment and show the versatility of SINAREQ in enhancing quantized training algorithms (DoReFa and WRPN) with about 4.8% accuracy improvements on average, and then outperforming multiple state-of-the-art techniques. |
Tasks | Quantization |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00146v1 |
https://arxiv.org/pdf/2003.00146v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-based-deep-quantization-of-neural |
Repo | |
Framework | |
On Parameter Tuning in Meta-learning for Computer Vision
Title | On Parameter Tuning in Meta-learning for Computer Vision |
Authors | Farid Ghareh Mohammadi, M. Hadi Amini, Hamid R. Arabnia |
Abstract | Learning to learn plays a pivotal role in meta-learning (MTL) to obtain an optimal learning model. In this paper, we investigate mage recognition for unseen categories of a given dataset with limited training information. We deploy a zero-shot learning (ZSL) algorithm to achieve this goal. We also explore the effect of parameter tuning on performance of semantic auto-encoder (SAE). We further address the parameter tuning problem for meta-learning, especially focusing on zero-shot learning. By combining different embedded parameters, we improved the accuracy of tuned-SAE. Advantages and disadvantages of parameter tuning and its application in image classification are also explored. |
Tasks | Image Classification, Meta-Learning, Zero-Shot Learning |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2003.00837v1 |
https://arxiv.org/pdf/2003.00837v1.pdf | |
PWC | https://paperswithcode.com/paper/on-parameter-tuning-in-meta-learning-for |
Repo | |
Framework | |
Adversarial Attack on Deep Product Quantization Network for Image Retrieval
Title | Adversarial Attack on Deep Product Quantization Network for Image Retrieval |
Authors | Yan Feng, Bin Chen, Tao Dai, Shutao Xia |
Abstract | Deep product quantization network (DPQN) has recently received much attention in fast image retrieval tasks due to its efficiency of encoding high-dimensional visual features especially when dealing with large-scale datasets. Recent studies show that deep neural networks (DNNs) are vulnerable to input with small and maliciously designed perturbations (a.k.a., adversarial examples). This phenomenon raises the concern of security issues for DPQN in the testing/deploying stage as well. However, little effort has been devoted to investigating how adversarial examples affect DPQN. To this end, we propose product quantization adversarial generation (PQ-AG), a simple yet effective method to generate adversarial examples for product quantization based retrieval systems. PQ-AG aims to generate imperceptible adversarial perturbations for query images to form adversarial queries, whose nearest neighbors from a targeted product quantizaiton model are not semantically related to those from the original queries. Extensive experiments show that our PQ-AQ successfully creates adversarial examples to mislead targeted product quantization retrieval models. Besides, we found that our PQ-AG significantly degrades retrieval performance in both white-box and black-box settings. |
Tasks | Adversarial Attack, Image Retrieval, Quantization |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11374v1 |
https://arxiv.org/pdf/2002.11374v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-attack-on-deep-product |
Repo | |
Framework | |
A Simulation Model Demonstrating the Impact of Social Aspects on Social Internet of Things
Title | A Simulation Model Demonstrating the Impact of Social Aspects on Social Internet of Things |
Authors | Kashif Zia |
Abstract | In addition to seamless connectivity and smartness, the objects in the Internet of Things (IoT) are expected to have the social capabilities – these objects are termed as ``social objects’'. In this paper, an intuitive paradigm of social interactions between these objects are argued and modeled. The impact of social behavior on the interaction pattern of social objects is studied taking Peer-to-Peer (P2P) resource sharing as an example application. The model proposed in this paper studies the implications of competitive vs. cooperative social paradigm, while peers attempt to attain the shared resources / services. The simulation results divulge that the social capabilities of the peers impart a significant increase in the quality of interactions between social objects. Through an agent-based simulation study, it is proved that cooperative strategy is more efficient than competitive strategy. Moreover, cooperation with an underpinning on real-life networking structure and mobility does not negatively impact the efficiency of the system at all; rather it helps. | |
Tasks | |
Published | 2020-02-23 |
URL | https://arxiv.org/abs/2002.11507v1 |
https://arxiv.org/pdf/2002.11507v1.pdf | |
PWC | https://paperswithcode.com/paper/a-simulation-model-demonstrating-the-impact |
Repo | |
Framework | |
Task Augmentation by Rotating for Meta-Learning
Title | Task Augmentation by Rotating for Meta-Learning |
Authors | Jialin Liu, Fei Chao, Chih-Min Lin |
Abstract | Data augmentation is one of the most effective approaches for improving the accuracy of modern machine learning models, and it is also indispensable to train a deep model for meta-learning. In this paper, we introduce a task augmentation method by rotating, which increases the number of classes by rotating the original images 90, 180 and 270 degrees, different from traditional augmentation methods which increase the number of images. With a larger amount of classes, we can sample more diverse task instances during training. Therefore, task augmentation by rotating allows us to train a deep network by meta-learning methods with little over-fitting. Experimental results show that our approach is better than the rotation for increasing the number of images and achieves state-of-the-art performance on miniImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks. The code is available on \url{www.github.com/AceChuse/TaskLevelAug}. |
Tasks | Data Augmentation, Few-Shot Learning, Meta-Learning |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2003.00804v1 |
https://arxiv.org/pdf/2003.00804v1.pdf | |
PWC | https://paperswithcode.com/paper/task-augmentation-by-rotating-for-meta |
Repo | |
Framework | |
Neural Sign Language Translation by Learning Tokenization
Title | Neural Sign Language Translation by Learning Tokenization |
Authors | Alptekin Orbay, Lale Akarun |
Abstract | Sign Language Translation has attained considerable success recently, raising hopes for improved communication with the Deaf. A pre-processing step called tokenization improves the success of translations. Tokens can be learned from sign videos if supervised data is available. However, data annotation at the gloss level is costly, and annotated data is scarce. The paper utilizes Adversarial, Multitask, Transfer Learning to search for semi-supervised tokenization approaches without burden of additional labeling. It provides extensive experiments to compare all the methods in different settings to conduct a deeper analysis. In the case of no additional target annotation besides sentences, the proposed methodology attains 13.25 BLUE-4 and 36.28 ROUGE scores which improves the current state-of-the-art by 4 points in BLUE-4 and 5 points in ROUGE. |
Tasks | Sign Language Translation, Tokenization, Transfer Learning |
Published | 2020-02-02 |
URL | https://arxiv.org/abs/2002.00479v2 |
https://arxiv.org/pdf/2002.00479v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-sign-language-translation-by-learning |
Repo | |
Framework | |
Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future
Title | Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future |
Authors | Grace W. Lindsay |
Abstract | Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNS in vision research beyond basic object recognition. |
Tasks | Object Recognition |
Published | 2020-01-20 |
URL | https://arxiv.org/abs/2001.07092v2 |
https://arxiv.org/pdf/2001.07092v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-networks-as-a-model-of |
Repo | |
Framework | |