January 31, 2020

3321 words 16 mins read

Paper Group ANR 190

Paper Group ANR 190

An LP-Based Approach for Goal Recognition as Planning. Human Following for Wheeled Robot with Monocular Pan-tilt Camera. A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT. Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes. Federated …

An LP-Based Approach for Goal Recognition as Planning

Title An LP-Based Approach for Goal Recognition as Planning
Authors Felipe Meneguzzi, Giovanna Lazzari Miotto, Ramon Fraga Pereira, André Grahl Pereira
Abstract Goal recognition is the problem of inferring the correct goal towards which an agent executes a plan, given a set of goal hypotheses, a domain model, and a (possibly noisy) sample of the plan being executed. This is a key problem in both cooperative and competitive agent interactions and recent approaches have produced fast and accurate goal recognition algorithms. In this paper, we leverage advances in operator-counting heuristics computed using linear programs over constraints derived from classical planning problems to solve goal recognition problems. Our approach uses additional operator-counting constraints derived from the observations to efficiently infer the correct goal, and serves as basis for a number of further methods with additional constraints.
Tasks
Published 2019-05-10
URL https://arxiv.org/abs/1905.04210v2
PDF https://arxiv.org/pdf/1905.04210v2.pdf
PWC https://paperswithcode.com/paper/robust-goal-recognition-with-operator
Repo
Framework

Human Following for Wheeled Robot with Monocular Pan-tilt Camera

Title Human Following for Wheeled Robot with Monocular Pan-tilt Camera
Authors Zheng Zhu, Hongxuan Ma, Wei Zou
Abstract Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications. Currently most human following systems are equipped with depth sensors to obtain distance information between human and robot, which suffer from the perception requirements and noises. In this paper, we design a wheeled mobile robot system with monocular pan-tilt camera to follow human, which can stay the target in the field of view and keep following simultaneously. The system consists of fast human detector, real-time and accurate visual tracker, and unified controller for mobile robot and pan-tilt camera. In visual tracking algorithm, both Siamese networks and optical flow information are exploited to locate and regress human simultaneously. In order in perform following with a monocular camera, the constraint of human height is introduced to design the controller. In experiments, human following are conducted and analysed in simulations and a real robot platform, which demonstrate the effectiveness and robustness of the overall system.
Tasks Optical Flow Estimation, Visual Tracking
Published 2019-09-13
URL https://arxiv.org/abs/1909.06087v1
PDF https://arxiv.org/pdf/1909.06087v1.pdf
PWC https://paperswithcode.com/paper/human-following-for-wheeled-robot-with
Repo
Framework

A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT

Title A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT
Authors Caio Ponte, Carlos Caminha, Rafael Bomfim, Ronaldo Moreira, Vasco Furtado
Abstract We present here the Temporal Clustering Algorithm (TCA), an incremental learning algorithm applicable to problems of anticipatory computing in the context of the Internet of Things. This algorithm was tested in a specific prediction scenario of consumption of an electric water dispenser typically used in tropical countries, in which the ambient temperature is around 30-degree Celsius. In this context, the user typically wants to drinking iced water therefore uses the cooler function of the dispenser. Real and synthetic water consumption data was used to test a forecasting capacity on how much energy can be saved by predicting the pattern of use of the equipment. In addition to using a small constant amount of memory, which allows the algorithm to be implemented at the lowest cost, while using microcontrollers with a small amount of memory (less than 1Kbyte) available on the market. The algorithm can also be configured according to user preference, prioritizing comfort, keeping the water at the desired temperature longer, or prioritizing energy savings. The main result is that the TCA achieved energy savings of up to 40% compared to the conventional mode of operation of the dispenser with an average success rate higher than 90% in its times of use.
Tasks
Published 2019-07-30
URL https://arxiv.org/abs/1907.13246v1
PDF https://arxiv.org/pdf/1907.13246v1.pdf
PWC https://paperswithcode.com/paper/a-temporal-clustering-algorithm-for-achieving
Repo
Framework

Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes

Title Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes
Authors Maik Simon, Markus Küchhold, Tobias Senst, Erik Bochinski, Thomas Sikora
Abstract Avoiding bottleneck situations in crowds is critical for the safety and comfort of people at large events or in public transportation. Based on the work of Lagrangian motion analysis we propose a novel video-based bottleneckdetector by identifying characteristic stowage patterns in crowd-movements captured by optical flow fields. The Lagrangian framework allows to assess complex timedependent crowd-motion dynamics at large temporal scales near the bottleneck by two dimensional Lagrangian fields. In particular we propose long-term temporal filtered Finite Time Lyapunov Exponents (FTLE) fields that provide towards a more global segmentation of the crowd movements and allows to capture its deformations when a crowd is passing a bottleneck. Finally, these deformations are used for an automatic spatio-temporal detection of such situations. The performance of the proposed approach is shown in extensive evaluations on the existing J"ulich and AGORASET datasets, that we have updated with ground truth data for spatio-temporal bottleneck analysis.
Tasks Optical Flow Estimation
Published 2019-08-21
URL https://arxiv.org/abs/1908.07772v1
PDF https://arxiv.org/pdf/1908.07772v1.pdf
PWC https://paperswithcode.com/paper/190807772
Repo
Framework

Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System

Title Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System
Authors Muhammad Ammad-ud-din, Elena Ivannikova, Suleiman A. Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, Adrian Flanagan
Abstract The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user’ privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user’s control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users’ implicit feedback and demonstrate the method’s applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user’s privacy in a widely used recommender application while maintaining recommender performance.
Tasks
Published 2019-01-29
URL http://arxiv.org/abs/1901.09888v1
PDF http://arxiv.org/pdf/1901.09888v1.pdf
PWC https://paperswithcode.com/paper/federated-collaborative-filtering-for-privacy
Repo
Framework

Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis

Title Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis
Authors Yuan Xue, Zihan Zhou, Xiaolei Huang
Abstract Advanced image synthesis methods can generate photo-realistic images for faces, birds, bedrooms, and more. However, these methods do not explicitly model and preserve essential structural constraints such as junctions, parallel lines, and planar surfaces. In this paper, we study the problem of structured indoor image generation for design applications. We utilize a small-scale dataset that contains both images of various indoor scenes and their corresponding ground-truth wireframe annotations. While existing image synthesis models trained on the dataset are insufficient in preserving structural integrity, we propose a novel model based on a structure-appearance joint embedding learned from both images and wireframes. In our model, structural constraints are explicitly enforced by learning a joint embedding in a shared encoder network that must support the generation of both images and wireframes. We demonstrate the effectiveness of the joint embedding learning scheme on the indoor scene wireframe to image translation task. While wireframes as input contain less semantic information than inputs of other traditional image translation tasks, our model can generate high fidelity indoor scene renderings that match well with input wireframes. Experiments on a wireframe-scene dataset show that our proposed translation model significantly outperforms existing state-of-the-art methods in both visual quality and structural integrity of generated images.
Tasks Image Generation
Published 2019-12-09
URL https://arxiv.org/abs/1912.03840v1
PDF https://arxiv.org/pdf/1912.03840v1.pdf
PWC https://paperswithcode.com/paper/learning-structure-appearance-joint-embedding
Repo
Framework

Surrogate Supervision for Medical Image Analysis: Effective Deep Learning From Limited Quantities of Labeled Data

Title Surrogate Supervision for Medical Image Analysis: Effective Deep Learning From Limited Quantities of Labeled Data
Authors Nima Tajbakhsh, Yufei Hu, Junli Cao, Xingjian Yan, Yi Xiao, Yong Lu, Jianming Liang, Demetri Terzopoulos, Xiaowei Ding
Abstract We investigate the effectiveness of a simple solution to the common problem of deep learning in medical image analysis with limited quantities of labeled training data. The underlying idea is to assign artificial labels to abundantly available unlabeled medical images and, through a process known as surrogate supervision, pre-train a deep neural network model for the target medical image analysis task lacking sufficient labeled training data. In particular, we employ 3 surrogate supervision schemes, namely rotation, reconstruction, and colorization, in 4 different medical imaging applications representing classification and segmentation for both 2D and 3D medical images. 3 key findings emerge from our research: 1) pre-training with surrogate supervision is effective for small training sets; 2) deep models trained from initial weights pre-trained through surrogate supervision outperform the same models when trained from scratch, suggesting that pre-training with surrogate supervision should be considered prior to training any deep 3D models; 3) pre-training models in the medical domain with surrogate supervision is more effective than transfer learning from an unrelated domain (e.g., natural images), indicating the practical value of abundant unlabeled medical image data.
Tasks Colorization, Transfer Learning
Published 2019-01-25
URL http://arxiv.org/abs/1901.08707v1
PDF http://arxiv.org/pdf/1901.08707v1.pdf
PWC https://paperswithcode.com/paper/surrogate-supervision-for-medical-image
Repo
Framework

Cross-Enhancement Transform Two-Stream 3D ConvNets for Action Recognition

Title Cross-Enhancement Transform Two-Stream 3D ConvNets for Action Recognition
Authors Dong Cao, Lisha Xu, Dongdong Zhang
Abstract Action recognition is an important research topic in computer vision. It is the basic work for visual understanding and has been applied in many fields. Since human actions can vary in different environments, it is difficult to infer actions in completely different states with a same structural model. For this case, we propose a Cross-Enhancement Transform Two-Stream 3D ConvNets algorithm, which considers the action distribution characteristics on the specific dataset. As a teaching model, stream with better performance in both streams is expected to assist in training another stream. In this way, the enhanced-trained stream and teacher stream are combined to infer actions. We implement experiments on the video datasets UCF-101, HMDB-51, and Kinetics-400, and the results confirm the effectiveness of our algorithm.
Tasks Autonomous Driving, Autonomous Vehicles, Optical Flow Estimation, Transfer Learning
Published 2019-08-19
URL https://arxiv.org/abs/1908.08916v2
PDF https://arxiv.org/pdf/1908.08916v2.pdf
PWC https://paperswithcode.com/paper/cross-enhancement-transform-two-stream-3d
Repo
Framework

Applying Generative Adversarial Networks to Intelligent Subsurface Imaging and Identification

Title Applying Generative Adversarial Networks to Intelligent Subsurface Imaging and Identification
Authors William Rice
Abstract To augment training data for machine learning models in Ground Penetrating Radar (GPR) data classification and identification, this thesis focuses on the generation of realistic GPR data using Generative Adversarial Networks. An innovative GAN architecture is proposed for generating GPR B-scans, which is, to the author’s knowledge, the first successful application of GAN to GPR B-scans. As one of the major contributions, a novel loss function is formulated by merging frequency domain with time domain features. To test the efficacy of generated B-scans, a real time object classifier is proposed to measure the performance gain derived from augmented B-Scan images. The numerical experiment illustrated that, based on the augmented training data, the proposed GAN architecture demonstrated a significant increase (from 82% to 98%) in the accuracy of the object classifier.
Tasks
Published 2019-05-30
URL https://arxiv.org/abs/1905.13321v1
PDF https://arxiv.org/pdf/1905.13321v1.pdf
PWC https://paperswithcode.com/paper/applying-generative-adversarial-networks-to
Repo
Framework

Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere

Title Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere
Authors Pietro Verzelli, Cesare Alippi, Lorenzo Livi
Abstract Among the various architectures of Recurrent Neural Networks, Echo State Networks (ESNs) emerged due to their simplified and inexpensive training procedure. These networks are known to be sensitive to the setting of hyper-parameters, which critically affect their behaviour. Results show that their performance is usually maximized in a narrow region of hyper-parameter space called edge of chaos. Finding such a region requires searching in hyper-parameter space in a sensible way: hyper-parameter configurations marginally outside such a region might yield networks exhibiting fully developed chaos, hence producing unreliable computations. The performance gain due to optimizing hyper-parameters can be studied by considering the memory–nonlinearity trade-off, i.e., the fact that increasing the nonlinear behavior of the network degrades its ability to remember past inputs, and vice-versa. In this paper, we propose a model of ESNs that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behaviour in phase space characterised by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking.
Tasks
Published 2019-03-27
URL https://arxiv.org/abs/1903.11691v2
PDF https://arxiv.org/pdf/1903.11691v2.pdf
PWC https://paperswithcode.com/paper/echo-state-networks-with-self-normalizing
Repo
Framework

Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition

Title Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition
Authors Xiangyang Li, Luis Herranz, Shuqiang Jiang
Abstract In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets can not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used to a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available is usually limited or imbalanced. Fine-tuning (FT) is an effective way to transfer knowledge learned in a source dataset to a target task. In this paper, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets) and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuitions about how to implement fine-tuning for computer vision tasks.
Tasks
Published 2019-07-11
URL https://arxiv.org/abs/1907.05099v1
PDF https://arxiv.org/pdf/1907.05099v1.pdf
PWC https://paperswithcode.com/paper/multifaceted-analysis-of-fine-tuning-in-deep
Repo
Framework

Side-Aware Boundary Localization for More Precise Object Detection

Title Side-Aware Boundary Localization for More Precise Object Detection
Authors Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin
Abstract Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets.In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. Moreover, to tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.6%, and 0.9%, respectively. Code and models will be available at https://github.com/open-mmlab/mmdetection.
Tasks Object Detection
Published 2019-12-09
URL https://arxiv.org/abs/1912.04260v1
PDF https://arxiv.org/pdf/1912.04260v1.pdf
PWC https://paperswithcode.com/paper/side-aware-boundary-localization-for-more
Repo
Framework

Privacy Preserving Image-Based Localization

Title Privacy Preserving Image-Based Localization
Authors Pablo Speciale, Johannes L. Schönberger, Sing Bing Kang, Sudipta N. Sinha, Marc Pollefeys
Abstract Image-based localization is a core component of many augmented/mixed reality (AR/MR) and autonomous robotic systems. Current localization systems rely on the persistent storage of 3D point clouds of the scene to enable camera pose estimation, but such data reveals potentially sensitive scene information. This gives rise to significant privacy risks, especially as for many applications 3D mapping is a background process that the user might not be fully aware of. We pose the following question: How can we avoid disclosing confidential information about the captured 3D scene, and yet allow reliable camera pose estimation? This paper proposes the first solution to what we call privacy preserving image-based localization. The key idea of our approach is to lift the map representation from a 3D point cloud to a 3D line cloud. This novel representation obfuscates the underlying scene geometry while providing sufficient geometric constraints to enable robust and accurate 6-DOF camera pose estimation. Extensive experiments on several datasets and localization scenarios underline the high practical relevance of our proposed approach.
Tasks Image-Based Localization, Pose Estimation
Published 2019-03-13
URL http://arxiv.org/abs/1903.05572v1
PDF http://arxiv.org/pdf/1903.05572v1.pdf
PWC https://paperswithcode.com/paper/privacy-preserving-image-based-localization
Repo
Framework

Agnostic Lane Detection

Title Agnostic Lane Detection
Authors Yuenan Hou
Abstract Lane detection is an important yet challenging task in autonomous driving, which is affected by many factors, e.g., light conditions, occlusions caused by other vehicles, irrelevant markings on the road and the inherent long and thin property of lanes. Conventional methods typically treat lane detection as a semantic segmentation task, which assigns a class label to each pixel of the image. This formulation heavily depends on the assumption that the number of lanes is pre-defined and fixed and no lane changing occurs, which does not always hold. To make the lane detection model applicable to an arbitrary number of lanes and lane changing scenarios, we adopt an instance segmentation approach, which first differentiates lanes and background and then classify each lane pixel into each lane instance. Besides, a multi-task learning paradigm is utilized to better exploit the structural information and the feature pyramid architecture is used to detect extremely thin lanes. Three popular lane detection benchmarks, i.e., TuSimple, CULane and BDD100K, are used to validate the effectiveness of our proposed algorithm.
Tasks Autonomous Driving, Instance Segmentation, Lane Detection, Multi-Task Learning, Semantic Segmentation
Published 2019-05-02
URL http://arxiv.org/abs/1905.03704v1
PDF http://arxiv.org/pdf/1905.03704v1.pdf
PWC https://paperswithcode.com/paper/190503704
Repo
Framework

Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm

Title Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm
Authors Amarnath R, P Nagabhushan
Abstract In this research work, we perform text line segmentation directly in compressed representation of an unconstrained handwritten document image. In this relation, we make use of text line terminal points which is the current state-of-the-art. The terminal points spotted along both margins (left and right) of a document image for every text line are considered as source and target respectively. The tunneling algorithm uses a single agent (or robot) to identify the coordinate positions in the compressed representation to perform text-line segmentation of the document. The agent starts at a source point and progressively tunnels a path routing in between two adjacent text lines and reaches the probable target. The agent’s navigation path from source to the target bypassing obstacles, if any, results in segregating the two adjacent text lines. However, the target point would be known only when the agent reaches the destination; this is applicable for all source points and henceforth we could analyze the correspondence between source and target nodes. Artificial Intelligence in Expert systems, dynamic programming and greedy strategies are employed for every search space while tunneling. An exhaustive experimentation is carried out on various benchmark datasets including ICDAR13 and the performances are reported.
Tasks
Published 2019-01-03
URL http://arxiv.org/abs/1901.11477v1
PDF http://arxiv.org/pdf/1901.11477v1.pdf
PWC https://paperswithcode.com/paper/text-line-segmentation-in-compressed
Repo
Framework
comments powered by Disqus