April 2, 2020

3239 words 16 mins read

Paper Group ANR 302

Paper Group ANR 302

Self-Supervised Object-in-Gripper Segmentation from Robotic Motions. Domain Adaptation As a Problem of Inference on Graphical Models. Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video-Based Face Recognition. Learning robotic ultrasound scanning using probabilistic temporal ranking. A New MRAM-based Process In-Memory Accelerat …

Self-Supervised Object-in-Gripper Segmentation from Robotic Motions

Title Self-Supervised Object-in-Gripper Segmentation from Robotic Motions
Authors Wout Boerdijk, Martin Sundermeyer, Maximilian Durner, Rudolph Triebel
Abstract We present a novel technique to automatically generate annotated data for important robotic perception tasks such as object segmentation and 3D object reconstruction using a robot manipulator. Our self-supervised method can segment unknown objects from a robotic gripper in RGB video sequences by exploiting motion and temporal cues. The key aspect of our approach in contrast to existing systems is its independence of any hardware specifics such as extrinsic and intrinsic camera calibration and a robot model. We achieve this using a two-step process: First, we learn to predict segmentation masks for our given manipulator using optical flow estimation. Then, these masks are used in combination with motion cues to automatically distinguish between the manipulator, the background, and the unknown, grasped object. We perform a thorough comparison with alternative baselines and approaches in the literature. The obtained object views and masks are suitable training data for segmentation networks that generalize to novel environments and also allow for watertight 3D object reconstruction.
Tasks 3D Object Reconstruction, Calibration, Object Reconstruction, Optical Flow Estimation, Semantic Segmentation
Published 2020-02-11
URL https://arxiv.org/abs/2002.04487v2
PDF https://arxiv.org/pdf/2002.04487v2.pdf
PWC https://paperswithcode.com/paper/self-supervised-object-in-gripper
Repo
Framework

Domain Adaptation As a Problem of Inference on Graphical Models

Title Domain Adaptation As a Problem of Inference on Graphical Models
Authors Kun Zhang, Mingming Gong, Petar Stojanov, Biwei Huang, Clark Glymour
Abstract This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation.
Tasks Bayesian Inference, Domain Adaptation, Unsupervised Domain Adaptation
Published 2020-02-09
URL https://arxiv.org/abs/2002.03278v2
PDF https://arxiv.org/pdf/2002.03278v2.pdf
PWC https://paperswithcode.com/paper/domain-adaptation-as-a-problem-of-inference
Repo
Framework

Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video-Based Face Recognition

Title Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in Video-Based Face Recognition
Authors George Ekladious, Hugo Lemoine, Eric Granger, Kaveh Kamali, Salim Moudache
Abstract The scalability and complexity of deep learning models remains a key issue in many of visual recognition applications like, e.g., video surveillance, where fine tuning with labeled image data from each new camera is required to reduce the domain shift between videos captured from the source domain, e.g., a laboratory setting, and the target domain, i.e, an operational environment. In many video surveillance applications, like face recognition (FR) and person re-identification, a pair-wise matcher is used to assign a query image captured using a video camera to the corresponding reference images in a gallery. The different configurations and operational conditions of video cameras can introduce significant shifts in the pair-wise distance distributions, resulting in degraded recognition performance for new cameras. In this paper, a new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video cameras. To this end, a dual-triplet loss is introduced for metric learning, where two triplets are constructed using video data from a source camera, and a new target camera. In order to constitute the dual triplets, a mutual-supervised learning approach is introduced where the source camera acts as a teacher, providing the target camera with an initial embedding. Then, the student relies on the teacher to iteratively label the positive and negative pairs collected during, e.g., initial camera calibration. Both source and target embeddings continue to simultaneously learn such that their pair-wise distance distributions become aligned. For validation, the proposed metric learning technique is used to train deep Siamese networks under different training scenarios, and is compared to state-of-the-art techniques for still-to-video FR on the COX-S2V and a private video-based FR dataset.
Tasks Calibration, Domain Adaptation, Face Recognition, Metric Learning, Person Re-Identification, Unsupervised Domain Adaptation
Published 2020-02-11
URL https://arxiv.org/abs/2002.04206v1
PDF https://arxiv.org/pdf/2002.04206v1.pdf
PWC https://paperswithcode.com/paper/dual-triplet-metric-learning-for-unsupervised
Repo
Framework

Learning robotic ultrasound scanning using probabilistic temporal ranking

Title Learning robotic ultrasound scanning using probabilistic temporal ranking
Authors Michael Burke, Katie Lu, Daniel Angelov, Artūras Straižys, Craig Innes, Kartic Subr, Subramanian Ramamoorthy
Abstract This paper addresses a common class of problems where a robot learns to perform a discovery task based on example solutions, or human demonstrations. For example consider the problem of ultrasound scanning, where the demonstration requires that an expert adaptively searches for a satisfactory view of internal organs, vessels or tissue and potential anomalies while maintaining optimal contact between the probe and surface tissue. Such problems are currently solved by inferring notional rewards that, when optimised for, result in a plan that mimics demonstrations. A pivotal assumption, that plans with higher reward should be exponentially more likely, leads to the de facto approach for reward inference in robotics. While this approach of maximum entropy inverse reinforcement learning leads to a general and elegant formulation, it struggles to cope with frequently encountered sub-optimal demonstrations. In this paper, we propose an alternative approach to cope with the class of problems where sub-optimal demonstrations occur frequently. We hypothesise that, in tasks which require discovery, successive states of any demonstration are progressively more likely to be associated with a higher reward. We formalise this temporal ranking approach and show that it improves upon maximum-entropy approaches to perform reward inference for autonomous ultrasound scanning, a novel application of learning from demonstration in medical imaging.
Tasks
Published 2020-02-04
URL https://arxiv.org/abs/2002.01240v1
PDF https://arxiv.org/pdf/2002.01240v1.pdf
PWC https://paperswithcode.com/paper/learning-robotic-ultrasound-scanning-using
Repo
Framework

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

Title A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision
Authors Hongjie Wang, Yang Zhao, Chaojian Li, Yue Wang, Yingyan Lin
Abstract The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns. To reduce the dominant data movement cost of training, process in-memory (PIM) has emerged as a promising solution as it alleviates the need to access DNN weights. However, state-of-the-art PIM DNN training accelerators employ either analog/mixed signal computing which has limited precision or digital computing based on a memory technology that supports limited logic functions and thus requires complicated procedure to realize floating point computation. In this paper, we propose a spin orbit torque magnetic random access memory (SOT-MRAM) based digital PIM accelerator that supports floating point precision. Specifically, this new accelerator features an innovative (1) SOT-MRAM cell, (2) full addition design, and (3) floating point computation. Experiment results show that the proposed SOT-MRAM PIM based DNN training accelerator can achieve 3.3x, 1.8x, and 2.5x improvement in terms of energy, latency, and area, respectively, compared with a state-of-the-art PIM based DNN training accelerator.
Tasks
Published 2020-03-02
URL https://arxiv.org/abs/2003.01551v1
PDF https://arxiv.org/pdf/2003.01551v1.pdf
PWC https://paperswithcode.com/paper/a-new-mram-based-process-in-memory
Repo
Framework

Temporal Probability Calibration

Title Temporal Probability Calibration
Authors Tim Leathart, Maksymilian Polaczuk
Abstract In many applications, accurate class probability estimates are required, but many types of models produce poor quality probability estimates despite achieving acceptable classification accuracy. Even though probability calibration has been a hot topic of research in recent times, the majority of this has investigated non-sequential data. In this paper, we consider calibrating models that produce class probability estimates from sequences of data, focusing on the case where predictions are obtained from incomplete sequences. We show that traditional calibration techniques are not sufficiently expressive for this task, and propose methods that adapt calibration schemes depending on the length of an input sequence. Experimental evaluation shows that the proposed methods are often substantially more effective at calibrating probability estimates from modern sequential architectures for incomplete sequences across a range of application domains.
Tasks Calibration
Published 2020-02-07
URL https://arxiv.org/abs/2002.02644v2
PDF https://arxiv.org/pdf/2002.02644v2.pdf
PWC https://paperswithcode.com/paper/temporal-probability-calibration
Repo
Framework

Intra-Camera Supervised Person Re-Identification

Title Intra-Camera Supervised Person Re-Identification
Authors Xiangping Zhu, Xiatian Zhu, Minxian Li, Pietro Morerio, Vittorio Murino, Shaogang Gong
Abstract Existing person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we call Intra-Camera Supervised (ICS) person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors.
Tasks Person Re-Identification
Published 2020-02-12
URL https://arxiv.org/abs/2002.05046v1
PDF https://arxiv.org/pdf/2002.05046v1.pdf
PWC https://paperswithcode.com/paper/intra-camera-supervised-person-re-1
Repo
Framework

Adaptive Deep Metric Embeddings for Person Re-Identification under Occlusions

Title Adaptive Deep Metric Embeddings for Person Re-Identification under Occlusions
Authors Wanxiang Yang, Yan Yan, Si Chen
Abstract Person re-identification (ReID) under occlusions is a challenging problem in video surveillance. Most of existing person ReID methods take advantage of local features to deal with occlusions. However, these methods usually independently extract features from the local regions of an image without considering the relationship among different local regions. In this paper, we propose a novel person ReID method, which learns the spatial dependencies between the local regions and extracts the discriminative feature representation of the pedestrian image based on Long Short-Term Memory (LSTM), dealing with the problem of occlusions. In particular, we propose a novel loss (termed the adaptive nearest neighbor loss) based on the classification uncertainty to effectively reduce intra-class variations while enlarging inter-class differences within the adaptive neighborhood of the sample. The proposed loss enables the deep neural network to adaptively learn discriminative metric embeddings, which significantly improve the generalization capability of recognizing unseen person identities. Extensive comparative evaluations on challenging person ReID datasets demonstrate the significantly improved performance of the proposed method compared with several state-of-the-art methods.
Tasks Person Re-Identification
Published 2020-02-07
URL https://arxiv.org/abs/2002.02603v1
PDF https://arxiv.org/pdf/2002.02603v1.pdf
PWC https://paperswithcode.com/paper/adaptive-deep-metric-embeddings-for-person-re
Repo
Framework

Social and Governance Implications of Improved Data Efficiency

Title Social and Governance Implications of Improved Data Efficiency
Authors Aaron D. Tucker, Markus Anderljung, Allan Dafoe
Abstract Many researchers work on improving the data efficiency of machine learning. What would happen if they succeed? This paper explores the social-economic impact of increased data efficiency. Specifically, we examine the intuition that data efficiency will erode the barriers to entry protecting incumbent data-rich AI firms, exposing them to more competition from data-poor firms. We find that this intuition is only partially correct: data efficiency makes it easier to create ML applications, but large AI firms may have more to gain from higher performing AI systems. Further, we find that the effect on privacy, data markets, robustness, and misuse are complex. For example, while it seems intuitive that misuse risk would increase along with data efficiency – as more actors gain access to any level of capability – the net effect crucially depends on how much defensive measures are improved. More investigation into data efficiency, as well as research into the “AI production function”, will be key to understanding the development of the AI industry and its societal impacts.
Tasks
Published 2020-01-14
URL https://arxiv.org/abs/2001.05068v1
PDF https://arxiv.org/pdf/2001.05068v1.pdf
PWC https://paperswithcode.com/paper/social-and-governance-implications-of
Repo
Framework

Plug-and-Play Rescaling Based Crowd Counting in Static Images

Title Plug-and-Play Rescaling Based Crowd Counting in Static Images
Authors Usman Sajid, Guanghui Wang
Abstract Crowd counting is a challenging problem especially in the presence of huge crowd diversity across images and complex cluttered crowd-like background regions, where most previous approaches do not generalize well and consequently produce either huge crowd underestimation or overestimation. To address these challenges, we propose a new image patch rescaling module (PRM) and three independent PRM employed crowd counting methods. The proposed frameworks use the PRM module to rescale the image regions (patches) that require special treatment, whereas the classification process helps in recognizing and discarding any cluttered crowd-like background regions which may result in overestimation. Experiments on three standard benchmarks and cross-dataset evaluation show that our approach outperforms the state-of-the-art models in the RMSE evaluation metric with an improvement up to 10.4%, and possesses superior generalization ability to new datasets.
Tasks Crowd Counting
Published 2020-01-06
URL https://arxiv.org/abs/2001.01786v1
PDF https://arxiv.org/pdf/2001.01786v1.pdf
PWC https://paperswithcode.com/paper/plug-and-play-rescaling-based-crowd-counting
Repo
Framework

Detecting depression in dyadic conversations with multimodal narratives and visualizations

Title Detecting depression in dyadic conversations with multimodal narratives and visualizations
Authors Joshua Y. Kim, Greyson Y. Kim, Kalina Yacef
Abstract Conversations contain a wide spectrum of multimodal information that gives us hints about the emotions and moods of the speaker. In this paper, we developed a system that supports humans to analyze conversations. Our main contribution is the identification of appropriate multimodal features and the integration of such features into verbatim conversation transcripts. We demonstrate the ability of our system to take in a wide range of multimodal information and automatically generated a prediction score for the depression state of the individual. Our experiments showed that this approach yielded better performance than the baseline model. Furthermore, the multimodal narrative approach makes it easy to integrate learnings from other disciplines, such as conversational analysis and psychology. Lastly, this interdisciplinary and automated approach is a step towards emulating how practitioners record the course of treatment as well as emulating how conversational analysts have been analyzing conversations by hand.
Tasks
Published 2020-01-13
URL https://arxiv.org/abs/2001.04809v2
PDF https://arxiv.org/pdf/2001.04809v2.pdf
PWC https://paperswithcode.com/paper/detecting-depression-in-dyadic-conversations
Repo
Framework

CNN 101: Interactive Visual Learning for Convolutional Neural Networks

Title CNN 101: Interactive Visual Learning for Convolutional Neural Networks
Authors Zijie J. Wang, Robert Turko, Omar Shaikh, Haekyu Park, Nilaksh Das, Fred Hohman, Minsuk Kahng, Duen Horng Chau
Abstract The success of deep learning solving previously-thought hard problems has inspired many non-experts to learn and understand this exciting technology. However, it is often challenging for learners to take the first steps due to the complexity of deep learning models. We present our ongoing work, CNN 101, an interactive visualization system for explaining and teaching convolutional neural networks. Through tightly integrated interactive views, CNN 101 offers both overview and detailed descriptions of how a model works. Built using modern web technologies, CNN 101 runs locally in users’ web browsers without requiring specialized hardware, broadening the public’s education access to modern deep learning techniques.
Tasks
Published 2020-01-07
URL https://arxiv.org/abs/2001.02004v3
PDF https://arxiv.org/pdf/2001.02004v3.pdf
PWC https://paperswithcode.com/paper/cnn-101-interactive-visual-learning-for
Repo
Framework

Edge-Tailored Perception: Fast Inferencing in-the-Edge with Efficient Model Distribution

Title Edge-Tailored Perception: Fast Inferencing in-the-Edge with Efficient Model Distribution
Authors Ramyad Hadidi, Bahar Asgari, Jiashen Cao, Younmin Bae, Hyojong Kim, Michael S. Ryoo, Hyesoon Kim
Abstract The rise of deep neural networks (DNNs) is inspiring new studies in myriad of edge use cases with robots, autonomous agents, and Internet-of-things (IoT) devices. However, in-the-edge inferencing of DNNs is still a severe challenge mainly because of the contradiction between the inherent intensive resource requirements and the tight resource availability in several edge domains. Further, as communication is costly, taking advantage of other available edge devices is not an effective solution in edge domains. Therefore, to benefit from available compute resources with low communication overhead, we propose new edge-tailored perception (ETP) models that consist of several almost-independent and narrow branches. ETP models offer close-to-minimum communication overheads with better distribution opportunities while significantly reducing memory and computation footprints, all with a trivial accuracy loss for not accuracy-critical tasks. To show the benefits, we deploy ETP models on two real systems, Raspberry Pis and edge-level PYNQ FPGAs. Additionally, we share our insights about tailoring a systolic-based architecture for edge computing with FPGA implementations. ETP models created based on LeNet, CifarNet, VGG-S/16, AlexNet, and ResNets and trained on MNIST, CIFAR10/100, Flower102, and ImageNet, achieve a maximum and average speedups of 56x and 7x, compared to originals. ETP is an addition to existing single-device optimizations for embedded devices by enabling the exploitation of multiple devices. As an example, we show applying pruning and quantization on ETP models improves the average speedup to 33x.
Tasks Quantization
Published 2020-03-13
URL https://arxiv.org/abs/2003.06464v1
PDF https://arxiv.org/pdf/2003.06464v1.pdf
PWC https://paperswithcode.com/paper/edge-tailored-perception-fast-inferencing-in
Repo
Framework

Sparse Weight Activation Training

Title Sparse Weight Activation Training
Authors Md Aamir Raihan, Tor M. Aamodt
Abstract Training convolutional neural networks (CNNs) is time-consuming. Prior work has explored how to reduce the computational demands of training by eliminating gradients with relatively small magnitude. We show that eliminating small magnitude components has limited impact on the direction of high-dimensional vectors. However, in the context of training a CNN, we find that eliminating small magnitude components of weight and activation vectors allows us to train deeper networks on more complex datasets versus eliminating small magnitude components of gradients. We propose Sparse Weight Activation Training (SWAT), an algorithm that embodies these observations. SWAT reduces computations by 50% to 80% with better accuracy at a given level of sparsity versus the Dynamic Sparse Graph algorithm. SWAT also reduces memory footprint by 23% to 37% for activations and 50% to 80% for weights.
Tasks
Published 2020-01-07
URL https://arxiv.org/abs/2001.01969v1
PDF https://arxiv.org/pdf/2001.01969v1.pdf
PWC https://paperswithcode.com/paper/sparse-weight-activation-training-1
Repo
Framework

Scalable Deployment of AI Time-series Models for IoT

Title Scalable Deployment of AI Time-series Models for IoT
Authors Bradley Eck, Francesco Fusco, Robert Gormally, Mark Purcell, Seshu Tirupathi
Abstract IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI time-series models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in real-world smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.
Tasks Feature Engineering, Time Series
Published 2020-03-24
URL https://arxiv.org/abs/2003.12141v1
PDF https://arxiv.org/pdf/2003.12141v1.pdf
PWC https://paperswithcode.com/paper/scalable-deployment-of-ai-time-series-models
Repo
Framework
comments powered by Disqus