January 31, 2020

3321 words 16 mins read

Paper Group ANR 190

An LP-Based Approach for Goal Recognition as Planning. Human Following for Wheeled Robot with Monocular Pan-tilt Camera. A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT. Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes. Federated …

An LP-Based Approach for Goal Recognition as Planning


Title	An LP-Based Approach for Goal Recognition as Planning
Authors	Felipe Meneguzzi, Giovanna Lazzari Miotto, Ramon Fraga Pereira, André Grahl Pereira
Abstract	Goal recognition is the problem of inferring the correct goal towards which an agent executes a plan, given a set of goal hypotheses, a domain model, and a (possibly noisy) sample of the plan being executed. This is a key problem in both cooperative and competitive agent interactions and recent approaches have produced fast and accurate goal recognition algorithms. In this paper, we leverage advances in operator-counting heuristics computed using linear programs over constraints derived from classical planning problems to solve goal recognition problems. Our approach uses additional operator-counting constraints derived from the observations to efficiently infer the correct goal, and serves as basis for a number of further methods with additional constraints.
Tasks
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04210v2
PDF	https://arxiv.org/pdf/1905.04210v2.pdf
PWC	https://paperswithcode.com/paper/robust-goal-recognition-with-operator
Repo
Framework

Human Following for Wheeled Robot with Monocular Pan-tilt Camera


Title	Human Following for Wheeled Robot with Monocular Pan-tilt Camera
Authors	Zheng Zhu, Hongxuan Ma, Wei Zou
Abstract	Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications. Currently most human following systems are equipped with depth sensors to obtain distance information between human and robot, which suffer from the perception requirements and noises. In this paper, we design a wheeled mobile robot system with monocular pan-tilt camera to follow human, which can stay the target in the field of view and keep following simultaneously. The system consists of fast human detector, real-time and accurate visual tracker, and unified controller for mobile robot and pan-tilt camera. In visual tracking algorithm, both Siamese networks and optical flow information are exploited to locate and regress human simultaneously. In order in perform following with a monocular camera, the constraint of human height is introduced to design the controller. In experiments, human following are conducted and analysed in simulations and a real robot platform, which demonstrate the effectiveness and robustness of the overall system.
Tasks	Optical Flow Estimation, Visual Tracking
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06087v1
PDF	https://arxiv.org/pdf/1909.06087v1.pdf
PWC	https://paperswithcode.com/paper/human-following-for-wheeled-robot-with
Repo
Framework

A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT


Title	A Temporal Clustering Algorithm for Achieving the trade-off between the User Experience and the Equipment Economy in the Context of IoT
Authors	Caio Ponte, Carlos Caminha, Rafael Bomfim, Ronaldo Moreira, Vasco Furtado
Abstract	We present here the Temporal Clustering Algorithm (TCA), an incremental learning algorithm applicable to problems of anticipatory computing in the context of the Internet of Things. This algorithm was tested in a specific prediction scenario of consumption of an electric water dispenser typically used in tropical countries, in which the ambient temperature is around 30-degree Celsius. In this context, the user typically wants to drinking iced water therefore uses the cooler function of the dispenser. Real and synthetic water consumption data was used to test a forecasting capacity on how much energy can be saved by predicting the pattern of use of the equipment. In addition to using a small constant amount of memory, which allows the algorithm to be implemented at the lowest cost, while using microcontrollers with a small amount of memory (less than 1Kbyte) available on the market. The algorithm can also be configured according to user preference, prioritizing comfort, keeping the water at the desired temperature longer, or prioritizing energy savings. The main result is that the TCA achieved energy savings of up to 40% compared to the conventional mode of operation of the dispenser with an average success rate higher than 90% in its times of use.
Tasks
Published	2019-07-30
URL	https://arxiv.org/abs/1907.13246v1
PDF	https://arxiv.org/pdf/1907.13246v1.pdf
PWC	https://paperswithcode.com/paper/a-temporal-clustering-algorithm-for-achieving
Repo
Framework

Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes


Title	Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes
Authors	Maik Simon, Markus Küchhold, Tobias Senst, Erik Bochinski, Thomas Sikora
Abstract	Avoiding bottleneck situations in crowds is critical for the safety and comfort of people at large events or in public transportation. Based on the work of Lagrangian motion analysis we propose a novel video-based bottleneckdetector by identifying characteristic stowage patterns in crowd-movements captured by optical flow fields. The Lagrangian framework allows to assess complex timedependent crowd-motion dynamics at large temporal scales near the bottleneck by two dimensional Lagrangian fields. In particular we propose long-term temporal filtered Finite Time Lyapunov Exponents (FTLE) fields that provide towards a more global segmentation of the crowd movements and allows to capture its deformations when a crowd is passing a bottleneck. Finally, these deformations are used for an automatic spatio-temporal detection of such situations. The performance of the proposed approach is shown in extensive evaluations on the existing J"ulich and AGORASET datasets, that we have updated with ground truth data for spatio-temporal bottleneck analysis.
Tasks	Optical Flow Estimation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07772v1
PDF	https://arxiv.org/pdf/1908.07772v1.pdf
PWC	https://paperswithcode.com/paper/190807772
Repo
Framework

Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System


Title	Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System
Authors	Muhammad Ammad-ud-din, Elena Ivannikova, Suleiman A. Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, Adrian Flanagan
Abstract	The increasing interest in user privacy is leading to new privacy preserving machine learning paradigms. In the Federated Learning paradigm, a master machine learning model is distributed to user clients, the clients use their locally stored data and model for both inference and calculating model updates. The model updates are sent back and aggregated on the server to update the master model then redistributed to the clients. In this paradigm, the user data never leaves the client, greatly enhancing the user’ privacy, in contrast to the traditional paradigm of collecting, storing and processing user data on a backend server beyond the user’s control. In this paper we introduce, as far as we are aware, the first federated implementation of a Collaborative Filter. The federated updates to the model are based on a stochastic gradient approach. As a classical case study in machine learning, we explore a personalized recommendation system based on users’ implicit feedback and demonstrate the method’s applicability to both the MovieLens and an in-house dataset. Empirical validation confirms a collaborative filter can be federated without a loss of accuracy compared to a standard implementation, hence enhancing the user’s privacy in a widely used recommender application while maintaining recommender performance.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.09888v1
PDF	http://arxiv.org/pdf/1901.09888v1.pdf
PWC	https://paperswithcode.com/paper/federated-collaborative-filtering-for-privacy
Repo
Framework

Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis


Title	Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis
Authors	Yuan Xue, Zihan Zhou, Xiaolei Huang
Abstract	Advanced image synthesis methods can generate photo-realistic images for faces, birds, bedrooms, and more. However, these methods do not explicitly model and preserve essential structural constraints such as junctions, parallel lines, and planar surfaces. In this paper, we study the problem of structured indoor image generation for design applications. We utilize a small-scale dataset that contains both images of various indoor scenes and their corresponding ground-truth wireframe annotations. While existing image synthesis models trained on the dataset are insufficient in preserving structural integrity, we propose a novel model based on a structure-appearance joint embedding learned from both images and wireframes. In our model, structural constraints are explicitly enforced by learning a joint embedding in a shared encoder network that must support the generation of both images and wireframes. We demonstrate the effectiveness of the joint embedding learning scheme on the indoor scene wireframe to image translation task. While wireframes as input contain less semantic information than inputs of other traditional image translation tasks, our model can generate high fidelity indoor scene renderings that match well with input wireframes. Experiments on a wireframe-scene dataset show that our proposed translation model significantly outperforms existing state-of-the-art methods in both visual quality and structural integrity of generated images.
Tasks	Image Generation
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03840v1
PDF	https://arxiv.org/pdf/1912.03840v1.pdf
PWC	https://paperswithcode.com/paper/learning-structure-appearance-joint-embedding
Repo
Framework

Surrogate Supervision for Medical Image Analysis: Effective Deep Learning From Limited Quantities of Labeled Data


Title	Surrogate Supervision for Medical Image Analysis: Effective Deep Learning From Limited Quantities of Labeled Data
Authors	Nima Tajbakhsh, Yufei Hu, Junli Cao, Xingjian Yan, Yi Xiao, Yong Lu, Jianming Liang, Demetri Terzopoulos, Xiaowei Ding
Abstract	We investigate the effectiveness of a simple solution to the common problem of deep learning in medical image analysis with limited quantities of labeled training data. The underlying idea is to assign artificial labels to abundantly available unlabeled medical images and, through a process known as surrogate supervision, pre-train a deep neural network model for the target medical image analysis task lacking sufficient labeled training data. In particular, we employ 3 surrogate supervision schemes, namely rotation, reconstruction, and colorization, in 4 different medical imaging applications representing classification and segmentation for both 2D and 3D medical images. 3 key findings emerge from our research: 1) pre-training with surrogate supervision is effective for small training sets; 2) deep models trained from initial weights pre-trained through surrogate supervision outperform the same models when trained from scratch, suggesting that pre-training with surrogate supervision should be considered prior to training any deep 3D models; 3) pre-training models in the medical domain with surrogate supervision is more effective than transfer learning from an unrelated domain (e.g., natural images), indicating the practical value of abundant unlabeled medical image data.
Tasks	Colorization, Transfer Learning
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08707v1
PDF	http://arxiv.org/pdf/1901.08707v1.pdf
PWC	https://paperswithcode.com/paper/surrogate-supervision-for-medical-image
Repo
Framework

Cross-Enhancement Transform Two-Stream 3D ConvNets for Action Recognition


Title	Cross-Enhancement Transform Two-Stream 3D ConvNets for Action Recognition
Authors	Dong Cao, Lisha Xu, Dongdong Zhang
Abstract	Action recognition is an important research topic in computer vision. It is the basic work for visual understanding and has been applied in many fields. Since human actions can vary in different environments, it is difficult to infer actions in completely different states with a same structural model. For this case, we propose a Cross-Enhancement Transform Two-Stream 3D ConvNets algorithm, which considers the action distribution characteristics on the specific dataset. As a teaching model, stream with better performance in both streams is expected to assist in training another stream. In this way, the enhanced-trained stream and teacher stream are combined to infer actions. We implement experiments on the video datasets UCF-101, HMDB-51, and Kinetics-400, and the results confirm the effectiveness of our algorithm.
Tasks	Autonomous Driving, Autonomous Vehicles, Optical Flow Estimation, Transfer Learning
Published	2019-08-19
URL	https://arxiv.org/abs/1908.08916v2
PDF	https://arxiv.org/pdf/1908.08916v2.pdf
PWC	https://paperswithcode.com/paper/cross-enhancement-transform-two-stream-3d
Repo
Framework

Applying Generative Adversarial Networks to Intelligent Subsurface Imaging and Identification


Title	Applying Generative Adversarial Networks to Intelligent Subsurface Imaging and Identification
Authors	William Rice
Abstract	To augment training data for machine learning models in Ground Penetrating Radar (GPR) data classification and identification, this thesis focuses on the generation of realistic GPR data using Generative Adversarial Networks. An innovative GAN architecture is proposed for generating GPR B-scans, which is, to the author’s knowledge, the first successful application of GAN to GPR B-scans. As one of the major contributions, a novel loss function is formulated by merging frequency domain with time domain features. To test the efficacy of generated B-scans, a real time object classifier is proposed to measure the performance gain derived from augmented B-Scan images. The numerical experiment illustrated that, based on the augmented training data, the proposed GAN architecture demonstrated a significant increase (from 82% to 98%) in the accuracy of the object classifier.
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13321v1
PDF	https://arxiv.org/pdf/1905.13321v1.pdf
PWC	https://paperswithcode.com/paper/applying-generative-adversarial-networks-to
Repo
Framework

Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere


Title	Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere
Authors	Pietro Verzelli, Cesare Alippi, Lorenzo Livi
Abstract	Among the various architectures of Recurrent Neural Networks, Echo State Networks (ESNs) emerged due to their simplified and inexpensive training procedure. These networks are known to be sensitive to the setting of hyper-parameters, which critically affect their behaviour. Results show that their performance is usually maximized in a narrow region of hyper-parameter space called edge of chaos. Finding such a region requires searching in hyper-parameter space in a sensible way: hyper-parameter configurations marginally outside such a region might yield networks exhibiting fully developed chaos, hence producing unreliable computations. The performance gain due to optimizing hyper-parameters can be studied by considering the memory–nonlinearity trade-off, i.e., the fact that increasing the nonlinear behavior of the network degrades its ability to remember past inputs, and vice-versa. In this paper, we propose a model of ESNs that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behaviour in phase space characterised by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking.
Tasks
Published	2019-03-27
URL	https://arxiv.org/abs/1903.11691v2
PDF	https://arxiv.org/pdf/1903.11691v2.pdf
PWC	https://paperswithcode.com/paper/echo-state-networks-with-self-normalizing
Repo
Framework

Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition


Title	Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition
Authors	Xiangyang Li, Luis Herranz, Shuqiang Jiang
Abstract	In recent years, convolutional neural networks (CNNs) have achieved impressive performance for various visual recognition scenarios. CNNs trained on large labeled datasets can not only obtain significant performance on most challenging benchmarks but also provide powerful representations, which can be used to a wide range of other tasks. However, the requirement of massive amounts of data to train deep neural networks is a major drawback of these models, as the data available is usually limited or imbalanced. Fine-tuning (FT) is an effective way to transfer knowledge learned in a source dataset to a target task. In this paper, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition. These factors include parameters for the retraining procedure (e.g., the initial learning rate of fine-tuning), the distribution of the source and target data (e.g., the number of categories in the source dataset, the distance between the source and target datasets) and so on. We quantitatively and qualitatively analyze these factors, evaluate their influence, and present many empirical observations. The results reveal insights into what fine-tuning changes CNN parameters and provide useful and evidence-backed intuitions about how to implement fine-tuning for computer vision tasks.
Tasks
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05099v1
PDF	https://arxiv.org/pdf/1907.05099v1.pdf
PWC	https://paperswithcode.com/paper/multifaceted-analysis-of-fine-tuning-in-deep
Repo
Framework

Side-Aware Boundary Localization for More Precise Object Detection


Title	Side-Aware Boundary Localization for More Precise Object Detection
Authors	Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin
Abstract	Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the targets.In this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. Moreover, to tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.6%, and 0.9%, respectively. Code and models will be available at https://github.com/open-mmlab/mmdetection.
Tasks	Object Detection
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04260v1
PDF	https://arxiv.org/pdf/1912.04260v1.pdf
PWC	https://paperswithcode.com/paper/side-aware-boundary-localization-for-more
Repo
Framework

Privacy Preserving Image-Based Localization


Title	Privacy Preserving Image-Based Localization
Authors	Pablo Speciale, Johannes L. Schönberger, Sing Bing Kang, Sudipta N. Sinha, Marc Pollefeys
Abstract	Image-based localization is a core component of many augmented/mixed reality (AR/MR) and autonomous robotic systems. Current localization systems rely on the persistent storage of 3D point clouds of the scene to enable camera pose estimation, but such data reveals potentially sensitive scene information. This gives rise to significant privacy risks, especially as for many applications 3D mapping is a background process that the user might not be fully aware of. We pose the following question: How can we avoid disclosing confidential information about the captured 3D scene, and yet allow reliable camera pose estimation? This paper proposes the first solution to what we call privacy preserving image-based localization. The key idea of our approach is to lift the map representation from a 3D point cloud to a 3D line cloud. This novel representation obfuscates the underlying scene geometry while providing sufficient geometric constraints to enable robust and accurate 6-DOF camera pose estimation. Extensive experiments on several datasets and localization scenarios underline the high practical relevance of our proposed approach.
Tasks	Image-Based Localization, Pose Estimation
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05572v1
PDF	http://arxiv.org/pdf/1903.05572v1.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-image-based-localization
Repo
Framework

Agnostic Lane Detection


Title	Agnostic Lane Detection
Authors	Yuenan Hou
Abstract	Lane detection is an important yet challenging task in autonomous driving, which is affected by many factors, e.g., light conditions, occlusions caused by other vehicles, irrelevant markings on the road and the inherent long and thin property of lanes. Conventional methods typically treat lane detection as a semantic segmentation task, which assigns a class label to each pixel of the image. This formulation heavily depends on the assumption that the number of lanes is pre-defined and fixed and no lane changing occurs, which does not always hold. To make the lane detection model applicable to an arbitrary number of lanes and lane changing scenarios, we adopt an instance segmentation approach, which first differentiates lanes and background and then classify each lane pixel into each lane instance. Besides, a multi-task learning paradigm is utilized to better exploit the structural information and the feature pyramid architecture is used to detect extremely thin lanes. Three popular lane detection benchmarks, i.e., TuSimple, CULane and BDD100K, are used to validate the effectiveness of our proposed algorithm.
Tasks	Autonomous Driving, Instance Segmentation, Lane Detection, Multi-Task Learning, Semantic Segmentation
Published	2019-05-02
URL	http://arxiv.org/abs/1905.03704v1
PDF	http://arxiv.org/pdf/1905.03704v1.pdf
PWC	https://paperswithcode.com/paper/190503704
Repo
Framework

Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm


Title	Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm
Authors	Amarnath R, P Nagabhushan
Abstract	In this research work, we perform text line segmentation directly in compressed representation of an unconstrained handwritten document image. In this relation, we make use of text line terminal points which is the current state-of-the-art. The terminal points spotted along both margins (left and right) of a document image for every text line are considered as source and target respectively. The tunneling algorithm uses a single agent (or robot) to identify the coordinate positions in the compressed representation to perform text-line segmentation of the document. The agent starts at a source point and progressively tunnels a path routing in between two adjacent text lines and reaches the probable target. The agent’s navigation path from source to the target bypassing obstacles, if any, results in segregating the two adjacent text lines. However, the target point would be known only when the agent reaches the destination; this is applicable for all source points and henceforth we could analyze the correspondence between source and target nodes. Artificial Intelligence in Expert systems, dynamic programming and greedy strategies are employed for every search space while tunneling. An exhaustive experimentation is carried out on various benchmark datasets including ICDAR13 and the performances are reported.
Tasks
Published	2019-01-03
URL	http://arxiv.org/abs/1901.11477v1
PDF	http://arxiv.org/pdf/1901.11477v1.pdf
PWC	https://paperswithcode.com/paper/text-line-segmentation-in-compressed
Repo
Framework