January 26, 2020

3298 words 16 mins read

Paper Group ANR 1574

OntoScene, A Logic-based Scene Interpreter: Implementation and Application in the Rock Art Domain. Efficient Deep Neural Networks. Wasserstein total variation filtering. Audio-Visual Embodied Navigation. Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings. AnimalWeb: A Large-Scale Hierarchical Dataset of Anno …

OntoScene, A Logic-based Scene Interpreter: Implementation and Application in the Rock Art Domain


Title	OntoScene, A Logic-based Scene Interpreter: Implementation and Application in the Rock Art Domain
Authors	Daniela Briola, Viviana Mascardi, Massimiliano Gioseffi
Abstract	We present OntoScene, a framework aimed at understanding the semantics of visual scenes starting from the semantics of their elements and the spatial relations holding between them. OntoScene exploits ontologies for representing knowledge and Prolog for specifying the interpretation rules that domain experts may adopt, and for implementing the SceneInterpreter engine. Ontologies allow the designer to formalize the domain in a reusable way, and make the system modular and interoperable with existing multiagent systems, while Prolog provides a solid basis to define complex rules of interpretation in a way that can be affordable even for people with no background in Computational Logics. The domain selected for experimenting OntoScene is that of prehistoric rock art, which provides us with a fascinating and challenging testbed. Under consideration in Theory and Practice of Logic Programming (TPLP)
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.04863v1
PDF	https://arxiv.org/pdf/1911.04863v1.pdf
PWC	https://paperswithcode.com/paper/ontoscene-a-logic-based-scene-interpreter
Repo
Framework

Efficient Deep Neural Networks


Title	Efficient Deep Neural Networks
Authors	Bichen Wu
Abstract	The success of deep neural networks (DNNs) is attributable to three factors: increased compute capacity, more complex models, and more data. These factors, however, are not always present, especially for edge applications such as autonomous driving, augmented reality, and internet-of-things. Training DNNs requires a large amount of data, which is difficult to obtain. Edge devices such as mobile phones have limited compute capacity, and therefore, require specialized and efficient DNNs. However, due to the enormous design space and prohibitive training costs, designing efficient DNNs for different target devices is challenging. So the question is, with limited data, compute capacity, and model complexity, can we still successfully apply deep neural networks? This dissertation focuses on the above problems and improving the efficiency of deep neural networks at four levels. Model efficiency: we designed neural networks for various computer vision tasks and achieved more than 10x faster speed and lower energy. Data efficiency: we developed an advanced tool that enables 6.2x faster annotation of a LiDAR point cloud. We also leveraged domain adaptation to utilize simulated data, bypassing the need for real data. Hardware efficiency: we co-designed neural networks and hardware accelerators and achieved 11.6x faster inference. Design efficiency: the process of finding the optimal neural networks is time-consuming. Our automated neural architecture search algorithms discovered, using 421x lower computational cost than previous search methods, models with state-of-the-art accuracy and efficiency.
Tasks	Autonomous Driving, Domain Adaptation, Neural Architecture Search
Published	2019-08-20
URL	https://arxiv.org/abs/1908.08926v1
PDF	https://arxiv.org/pdf/1908.08926v1.pdf
PWC	https://paperswithcode.com/paper/efficient-deep-neural-networks
Repo
Framework

Wasserstein total variation filtering


Title	Wasserstein total variation filtering
Authors	Erdem Varol, Amin Nejatbakhsh
Abstract	In this paper, we expand upon the theory of trend filtering by introducing the use of the Wasserstein metric as a means to control the amount of spatiotemporal variation in filtered time series data. While trend filtering utilizes regularization to produce signal estimates that are piecewise linear, in the case of $\ell_1$ regularization, or temporally smooth, in the case of $\ell_2$ regularization, it ignores the topology of the spatial distribution of signal. By incorporating the information about the underlying metric space of the pixel layout, the Wasserstein metric is an attractive choice as a regularizer to undercover spatiotemporal trends in time series data. We introduce a globally optimal algorithm for efficiently estimating the filtered signal under a Wasserstein finite differences operator. The efficacy of the proposed algorithm in preserving spatiotemporal trends in time series video is demonstrated in both simulated and fluorescent microscopy videos of the nematode caenorhabditis elegans and compared against standard trend filtering algorithms.
Tasks	Time Series
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10822v1
PDF	https://arxiv.org/pdf/1910.10822v1.pdf
PWC	https://paperswithcode.com/paper/wasserstein-total-variation-filtering
Repo
Framework


Title	Audio-Visual Embodied Navigation
Authors	Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
Abstract	Moving around in the world is naturally a multisensory experience, but today’s embodied agents are deaf - restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to an audio-based target. We develop a multi-modal deep reinforcement learning pipeline to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to (1) discover elements of the geometry of the physical space indicated by the reverberating audio and (2) detect and follow sound-emitting targets. We further introduce audio renderings based on geometrical acoustic simulations for a set of publicly available 3D assets and instrument AI-Habitat to support the new sensor, making it possible to insert arbitrary sound sources in an array of apartment, office, and hotel environments. Our results show that audio greatly benefits embodied visual navigation in 3D spaces.
Tasks	Visual Navigation
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11474v1
PDF	https://arxiv.org/pdf/1912.11474v1.pdf
PWC	https://paperswithcode.com/paper/audio-visual-embodied-navigation
Repo
Framework

Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings


Title	Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings
Authors	Vanja Doskoč, Tobias Friedrich, Andreas Göbel, Frank Neumann, Aneta Neumann, Francesco Quinzan
Abstract	We study the problem of maximizing a non-monotone submodular function under multiple knapsack constraints. We propose a simple discrete greedy algorithm to approach this problem, and prove that it yields strong approximation guarantees for functions with bounded curvature. In contrast to other heuristics, this requires no problem relaxation to continuous domains and it maintains a constant-factor approximation guarantee in the problem size. In the case of a single knapsack, our analysis suggests that the standard greedy can be used in non-monotone settings. Additionally, we study this problem in a dynamic setting, by which knapsacks change during the optimization process. We modify our greedy algorithm to avoid a complete restart at each constraint update. This modification retains the approximation guarantees of the static case. We evaluate our results experimentally on a video summarization and sensor placement task. We show that our proposed algorithm competes with the state-of-the-art in static settings. Furthermore, we show that in dynamic settings with tight computational time budget, our modified greedy yields significant improvements over starting the greedy from scratch, in terms of the solution quality achieved.
Tasks	Video Summarization
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06791v3
PDF	https://arxiv.org/pdf/1911.06791v3.pdf
PWC	https://paperswithcode.com/paper/non-monotone-submodular-maximization-with
Repo
Framework

AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces


Title	AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces
Authors	Muhammad Haris Khan, John McDonagh, Salman Khan, Muhammad Shahabuddin, Aditya Arora, Fahad Shahbaz Khan, Ling Shao, Georgios Tzimiropoulos
Abstract	Being heavily reliant on animals, it is our ethical obligation to improve their well-being by understanding their needs. Several studies show that animal needs are often expressed through their faces. Though remarkable progress has been made towards the automatic understanding of human faces, this has regrettably not been the case with animal faces. There exists significant room and appropriate need to develop automatic systems capable of interpreting animal faces. Among many transformative impacts, such a technology will foster better and cheaper animal healthcare, and further advance animal psychology understanding. We believe the underlying research progress is mainly obstructed by the lack of an adequately annotated dataset of animal faces, covering a wide spectrum of animal species. To this end, we introduce a large-scale, hierarchical annotated dataset of animal faces, featuring 21.9K faces from 334 diverse species and 21 animal orders across biological taxonomy. These faces are captured `in-the-wild’ conditions and are consistently annotated with 9 landmarks on key facial features. The proposed dataset is structured and scalable by design; its development underwent four systematic stages involving rigorous, manual annotation effort of over 6K man-hours. We benchmark it for face alignment using the existing art under novel problem settings. Results showcase its challenging nature, unique attributes and present definite prospects for novel, adaptive, and generalized face-oriented CV algorithms. We further benchmark the dataset for face detection and fine-grained recognition tasks, to demonstrate multi-task applications and room for improvement. Experiments indicate that this dataset will push the algorithmic advancements across many related CV tasks and encourage the development of novel systems for animal facial behaviour monitoring. We will make the dataset publicly available. \|
Tasks	Face Alignment, Face Detection
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04951v1
PDF	https://arxiv.org/pdf/1909.04951v1.pdf
PWC	https://paperswithcode.com/paper/animalweb-a-large-scale-hierarchical-dataset
Repo
Framework

System Misuse Detection via Informed Behavior Clustering and Modeling


Title	System Misuse Detection via Informed Behavior Clustering and Modeling
Authors	Linara Adilova, Livin Natious, Siming Chen, Olivier Thonnard, Michael Kamp
Abstract	One of the main tasks of cybersecurity is recognizing malicious interactions with an arbitrary system. Currently, the logging information from each interaction can be collected in almost unrestricted amounts, but identification of attacks requires a lot of effort and time of security experts. We propose an approach for identifying fraud activity through modeling normal behavior in interactions with a system via machine learning methods, in particular LSTM neural networks. In order to enrich the modeling with system specific knowledge, we propose to use an interactive visual interface that allows security experts to identify semantically meaningful clusters of interactions. These clusters incorporate domain knowledge and lead to more precise behavior modeling via informed machine learning. We evaluate the proposed approach on a dataset containing logs of interactions with an administrative interface of login and security server. Our empirical results indicate that the informed modeling is capable of capturing normal behavior, which can then be used to detect abnormal behavior.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00874v1
PDF	https://arxiv.org/pdf/1907.00874v1.pdf
PWC	https://paperswithcode.com/paper/system-misuse-detection-via-informed-behavior
Repo
Framework

Foveated image processing for faster object detection and recognition in embedded systems using deep convolutional neural networks


Title	Foveated image processing for faster object detection and recognition in embedded systems using deep convolutional neural networks
Authors	Uziel Jaramillo-Avila, Sean R. Anderson
Abstract	Object detection and recognition algorithms using deep convolutional neural networks (CNNs) tend to be computationally intensive to implement. This presents a particular challenge for embedded systems, such as mobile robots, where the computational resources tend to be far less than for workstations. As an alternative to standard, uniformly sampled images, we propose the use of foveated image sampling here to reduce the size of images, which are faster to process in a CNN due to the reduced number of convolution operations. We evaluate object detection and recognition on the Microsoft COCO database, using foveated image sampling at different image sizes, ranging from 416x416 to 96x96 pixels, on an embedded GPU – an NVIDIA Jetson TX2 with 256 CUDA cores. The results show that it is possible to achieve a 4x speed-up in frame rates, from 3.59 FPS to 15.24 FPS, using 416x416 and 128x128 pixel images respectively. For foveated sampling, this image size reduction led to just a small decrease in recall performance in the foveal region, to 92.0% of the baseline performance with full-sized images, compared to a significant decrease to 50.1% of baseline recall performance in uniformly sampled images, demonstrating the advantage of foveated sampling.
Tasks	Object Detection
Published	2019-08-15
URL	https://arxiv.org/abs/1908.09000v1
PDF	https://arxiv.org/pdf/1908.09000v1.pdf
PWC	https://paperswithcode.com/paper/foveated-image-processing-for-faster-object
Repo
Framework

Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound


Title	Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound
Authors	Zhao Song, Xin Yang
Abstract	We improve the over-parametrization size over two beautiful results [Li and Liang’ 2018] and [Du, Zhai, Poczos and Singh’ 2019] in deep learning theory.
Tasks
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03593v2
PDF	https://arxiv.org/pdf/1906.03593v2.pdf
PWC	https://paperswithcode.com/paper/quadratic-suffices-for-over-parametrization
Repo
Framework

Learning High-Level Planning Symbols from Intrinsically Motivated Experience


Title	Learning High-Level Planning Symbols from Intrinsically Motivated Experience
Authors	Angelo Oddi, Riccardo Rasconi, Emilio Cartoni, Gabriele Sartor, Gianluca Baldassarre, Vieri Giuliano Santucci
Abstract	In symbolic planning systems, the knowledge on the domain is commonly provided by an expert. Recently, an automatic abstraction procedure has been proposed in the literature to create a Planning Domain Definition Language (PDDL) representation, which is the most widely used input format for most off-the-shelf automated planners, starting from `options’, a data structure used to represent actions within the hierarchical reinforcement learning framework. We propose an architecture that potentially removes the need for human intervention. In particular, the architecture first acquires options in a fully autonomous fashion on the basis of open-ended learning, then builds a PDDL domain based on symbols and operators that can be used to accomplish user-defined goals through a standard PDDL planner. We start from an implementation of the above mentioned procedure tested on a set of benchmark domains in which a humanoid robot can change the state of some objects through direct interaction with the environment. We then investigate some critical aspects of the information abstraction process that have been observed, and propose an extension that mitigates such criticalities, in particular by analysing the type of classifiers that allow a suitable grounding of symbols. \|
Tasks	Hierarchical Reinforcement Learning
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08313v1
PDF	https://arxiv.org/pdf/1907.08313v1.pdf
PWC	https://paperswithcode.com/paper/learning-high-level-planning-symbols-from
Repo
Framework

Indoor Localization for IoT Using Adaptive Feature Selection: A Cascaded Machine Learning Approach


Title	Indoor Localization for IoT Using Adaptive Feature Selection: A Cascaded Machine Learning Approach
Authors	Mohamed I. AlHajri, Nazar T. Ali, Raed M. Shubair
Abstract	Evolving Internet-of-Things (IoT) applications often require the use of sensor-based indoor tracking and positioning, for which the performance is significantly improved by identifying the type of the surrounding indoor environment. This identification is of high importance since it leads to higher localization accuracy. This paper presents a novel method based on a cascaded two-stage machine learning approach for highly-accurate and robust localization in indoor environments using adaptive selection and combination of RF features. In the proposed method, machine learning is first used to identify the type of the surrounding indoor environment. Then, in the second stage, machine learning is employed to identify the most appropriate selection and combination of RF features that yield the highest localization accuracy. Analysis is based on k-Nearest Neighbor (k-NN) machine learning algorithm applied on a real dataset generated from practical measurements of the RF signal in realistic indoor environments. Received Signal Strength, Channel Transfer Function, and Frequency Coherence Function are the primary RF features being explored and combined. Numerical investigations demonstrate that prediction based on the concatenation of primary RF features enhanced significantly as the localization accuracy improved by at least 50% to more than 70%.
Tasks	Feature Selection
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01000v1
PDF	https://arxiv.org/pdf/1905.01000v1.pdf
PWC	https://paperswithcode.com/paper/indoor-localization-for-iot-using-adaptive
Repo
Framework


Title	Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
Authors	Arun Balajee Vasudevan, Dengxin Dai, Luc Van Gool
Abstract	The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor landmarks and local directions between them in order to help annotators formulate typical, human references to those. The annotation task was crowdsourced on the AMT platform, to construct a new Talk2Nav dataset with $10,714$ routes. Our second contribution is a new learning method. Inspired by spatial cognition research on the mental conceptualization of navigational instructions, we introduce a soft dual attention mechanism defined over the segmented language instructions to jointly extract two partial instructions – one for matching the next upcoming visual landmark and the other for matching the local directions to the next landmark. On the similar lines, we also introduce spatial memory scheme to encode the local directional transitions. Our work takes advantage of the advance in two lines of research: mental formalization of verbal navigational instructions and training neural network agents for automatic way finding. Extensive experiments show that our method significantly outperforms previous navigation methods. For demo video, dataset and code, please refer to our project page: https://www.trace.ethz.ch/publications/2019/talk2nav/index.html
Tasks	Autonomous Driving, Visual Navigation
Published	2019-10-04
URL	https://arxiv.org/abs/1910.02029v2
PDF	https://arxiv.org/pdf/1910.02029v2.pdf
PWC	https://paperswithcode.com/paper/talk2nav-long-range-vision-and-language
Repo
Framework

PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones


Title	PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones
Authors	Gang Chen, Shengyu He, Haitao Meng, Kai Huang
Abstract	Over the last years, a great success of deep neural networks (DNNs) has been witnessed in computer vision and other fields. However, performance and power constraints make it still challenging to deploy DNNs on mobile devices due to their high computational complexity. Binary neural networks (BNNs) have been demonstrated as a promising solution to achieve this goal by using bit-wise operations to replace most arithmetic operations. Currently, existing GPU-accelerated implementations of BNNs are only tailored for desktop platforms. Due to architecture differences, mere porting of such implementations to mobile devices yields suboptimal performance or is impossible in some cases. In this paper, we propose PhoneBit, a GPU-accelerated BNN inference engine for Android-based mobile devices that fully exploits the computing power of BNNs on mobile GPUs. PhoneBit provides a set of operator-level optimizations including locality-friendly data layout, bit packing with vectorization and layers integration for efficient binary convolution. We also provide a detailed implementation and parallelization optimization for PhoneBit to optimally utilize the memory bandwidth and computing power of mobile GPUs. We evaluate PhoneBit with AlexNet, YOLOv2 Tiny and VGG16 with their binary version. Our experiment results show that PhoneBit can achieve significant speedup and energy efficiency compared with state-of-the-art frameworks for mobile devices.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.04050v1
PDF	https://arxiv.org/pdf/1912.04050v1.pdf
PWC	https://paperswithcode.com/paper/phonebit-efficient-gpu-accelerated-binary
Repo
Framework

Atypical Facial Landmark Localisation with Stacked Hourglass Networks: A Study on 3D Facial Modelling for Medical Diagnosis


Title	Atypical Facial Landmark Localisation with Stacked Hourglass Networks: A Study on 3D Facial Modelling for Medical Diagnosis
Authors	Gary Storey, Ahmed Bouridane, Richard Jiang, Chang-tsun Li
Abstract	While facial biometrics has been widely used for identification purpose, it has recently been researched as medical biometrics for a range of diseases. In this chapter, we investigate the facial landmark detection for atypical 3D facial modelling in facial palsy cases, while potentially such modelling can assist the medical diagnosis using atypical facial features. In our work, a study of landmarks localisation methods such as stacked hourglass networks is conducted and evaluated to ascertain their accuracy when presented with unseen atypical faces. The evaluation highlights that the state-of-the-art stacked hourglass architecture outperforms other traditional methods.
Tasks	Face Alignment, Facial Landmark Detection, Medical Diagnosis
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02157v1
PDF	https://arxiv.org/pdf/1909.02157v1.pdf
PWC	https://paperswithcode.com/paper/atypical-facial-landmark-localisation-with
Repo
Framework

D3S – A Discriminative Single Shot Segmentation Tracker


Title	D3S – A Discriminative Single Shot Segmentation Tracker
Authors	Alan Lukežič, Jiří Matas, Matej Kristan
Abstract	Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker - D3S, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve high robustness and online target segmentation. Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-art trackers on the TrackingNet. D3S outperforms the leading segmentation tracker SiamMask on video object segmentation benchmark and performs on par with top video object segmentation algorithms, while running an order of magnitude faster, close to real-time.
Tasks	Object Tracking, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation, Visual Object Tracking
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08862v1
PDF	https://arxiv.org/pdf/1911.08862v1.pdf
PWC	https://paperswithcode.com/paper/d3s-a-discriminative-single-shot-segmentation
Repo
Framework