April 2, 2020

3533 words 17 mins read

Paper Group ANR 273

Paper Group ANR 273

Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders. The value of text for small business default prediction: A deep learning approach. Counting dense objects in remote sensing images. Spatio-Temporal Action Detection with Multi-Object Interaction. Table Structure Extraction w …

Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders

Title Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders
Authors Shomik Jain, Balasubramanian Thiagarajan, Zhonghao Shi, Caitlyn Clabaugh, Maja J. Matarić
Abstract Socially assistive robotics (SAR) has great potential to provide accessible, affordable, and personalized therapeutic interventions for children with autism spectrum disorders (ASD). However, human-robot interaction (HRI) methods are still limited in their ability to autonomously recognize and respond to behavioral cues, especially in atypical users and everyday settings. This work applies supervised machine learning algorithms to model user engagement in the context of long-term, in-home SAR interventions for children with ASD. Specifically, two types of engagement models are presented for each user: 1) generalized models trained on data from different users; and 2) individualized models trained on an early subset of the user’s data. The models achieved approximately 90% accuracy (AUROC) for post hoc binary classification of engagement, despite the high variance in data observed across users, sessions, and engagement states. Moreover, temporal patterns in model predictions could be used to reliably initiate re-engagement actions at appropriate times. These results validate the feasibility and challenges of recognition and response to user disengagement in long-term, real-world HRI settings. The contributions of this work also inform the design of engaging and personalized HRI, especially for the ASD community.
Published 2020-02-06
URL https://arxiv.org/abs/2002.02453v1
PDF https://arxiv.org/pdf/2002.02453v1.pdf
PWC https://paperswithcode.com/paper/modeling-engagement-in-long-term-in-home

The value of text for small business default prediction: A deep learning approach

Title The value of text for small business default prediction: A deep learning approach
Authors Matthew Stevenson, Christophe Mues, Cristián Bravo
Abstract Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. To mitigate limited data availability, it is standard policy for a loan officer to provide a textual loan assessment. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60000+ textual assessments. We consider the performance in terms of AUC (Area Under the Curve) and Balanced Accuracy and find that the text alone is surprisingly effective for predicting default. Yet, when combined with traditional data, it yields no additional predictive capability. We do find, however, that deep learning with categorical embeddings is capable of producing a modest performance improvement when compared to alternative machine learning models. We explore how the loan assessments influence predictions, explaining why despite the text being predictive, no additional performance is gained. This exploration leads us to a series of recommendations on a new strategy for the collection of future mSME loan assessments.
Published 2020-03-19
URL https://arxiv.org/abs/2003.08964v1
PDF https://arxiv.org/pdf/2003.08964v1.pdf
PWC https://paperswithcode.com/paper/the-value-of-text-for-small-business-default

Counting dense objects in remote sensing images

Title Counting dense objects in remote sensing images
Authors Guangshuai Gao, Qingjie Liu, Yunhong Wang
Abstract Estimating accurate number of interested objects from a given image is a challenging yet important task. Significant efforts have been made to address this problem and achieve great progress, yet counting number of ground objects from remote sensing images is barely studied. In this paper, we are interested in counting dense objects from remote sensing images. Compared with object counting in natural scene, this task is challenging in following factors: large scale variation, complex cluttered background and orientation arbitrariness. More importantly, the scarcity of data severely limits the development of research in this field. To address these issues, we first construct a large-scale object counting dataset based on remote sensing images, which contains four kinds of objects: buildings, crowded ships in harbor, large-vehicles and small-vehicles in parking lot. We then benchmark the dataset by designing a novel neural network which can generate density map of an input image. The proposed network consists of three parts namely convolution block attention module (CBAM), scale pyramid module (SPM) and deformable convolution module (DCM). Experiments on the proposed dataset and comparisons with state of the art methods demonstrate the challenging of the proposed dataset, and superiority and effectiveness of our method.
Tasks Object Counting
Published 2020-02-14
URL https://arxiv.org/abs/2002.05928v1
PDF https://arxiv.org/pdf/2002.05928v1.pdf
PWC https://paperswithcode.com/paper/counting-dense-objects-in-remote-sensing

Spatio-Temporal Action Detection with Multi-Object Interaction

Title Spatio-Temporal Action Detection with Multi-Object Interaction
Authors Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell
Abstract Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an “action tube”. Nowadays, most spatio-temporal action detection datasets (e.g. UCF101-24, AVA, DALY) are annotated with action tubes that contain a single person performing the action, thus the predominant action detection models simply employ a person detection and tracking pipeline for localization. However, when the action is defined as an interaction between multiple objects, such methods may fail since each bounding box in the action tube contains multiple objects instead of one person. In this paper, we study the spatio-temporal action detection problem with multi-object interaction. We introduce a new dataset that is annotated with action tubes containing multi-object interactions. Moreover, we propose an end-to-end spatio-temporal action detection model that performs both spatial and temporal regression simultaneously. Our spatial regression may enclose multiple objects participating in the action. During test time, we simply connect the regressed bounding boxes within the predicted temporal duration using a simple heuristic. We report the baseline results of our proposed model on this new dataset, and also show competitive results on the standard benchmark UCF101-24 using only RGB input.
Tasks Action Detection, Human Detection
Published 2020-04-01
URL https://arxiv.org/abs/2004.00180v1
PDF https://arxiv.org/pdf/2004.00180v1.pdf
PWC https://paperswithcode.com/paper/spatio-temporal-action-detection-with-multi

Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks

Title Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks
Authors Saqib Ali Khan, Syed Muhammad Daniyal Khalid, Muhammad Ali Shahzad, Faisal Shafait
Abstract Tables present summarized and structured information to the reader, which makes table structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with soft max activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-the-art table structure extraction systems by a significant margin.
Tasks Optical Character Recognition
Published 2020-01-08
URL https://arxiv.org/abs/2001.02501v1
PDF https://arxiv.org/pdf/2001.02501v1.pdf
PWC https://paperswithcode.com/paper/table-structure-extraction-with-bi

Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

Title Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data
Authors Xi Yan, David Acuna, Sanja Fidler
Abstract Transfer learning has proven to be a successful technique to train deep learning models in the domains where little training data is available. The dominant approach is to pretrain a model on a large generic dataset such as ImageNet and finetune its weights on the target domain. However, in the new era of an ever-increasing number of massive datasets, selecting the relevant data for pretraining is a critical issue. We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client, an end-user with a target application with its own small labeled dataset. The dataserver represents large datasets with a much more compact mixture-of-experts model, and employs it to perform data search in a series of dataserver-client transactions at a low computational cost. We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets and tasks such as image classification, object detection and instance segmentation. Neural Data Server is available as a web-service at http://aidemos.cs.toronto.edu/nds/.
Tasks Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning
Published 2020-01-09
URL https://arxiv.org/abs/2001.02799v3
PDF https://arxiv.org/pdf/2001.02799v3.pdf
PWC https://paperswithcode.com/paper/neural-data-server-a-large-scale-search

Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)

Title Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)
Authors Jamshed Memon, Maira Sami, Rizwan Ahmed Khan
Abstract Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artificial intelligence / machine learning tools to automatically analyze handwritten and printed documents in order to convert them into electronic format. The objective of this review paper is to summarize research that has been conducted on character recognition of handwritten documents and to provide research directions. In this Systematic Literature Review (SLR) we collected, synthesized and analyzed research articles on the topic of handwritten OCR (and closely related topics) which were published between year 2000 to 2018. We followed widely used electronic databases by following pre-defined review protocol. Articles were searched using keywords, forward reference searching and backward reference searching in order to search all the articles related to the topic. After carefully following study selection process 142 articles were selected for this SLR. This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.
Tasks Optical Character Recognition
Published 2020-01-01
URL https://arxiv.org/abs/2001.00139v1
PDF https://arxiv.org/pdf/2001.00139v1.pdf
PWC https://paperswithcode.com/paper/handwritten-optical-character-recognition-ocr

Fast Fair Regression via Efficient Approximations of Mutual Information

Title Fast Fair Regression via Efficient Approximations of Mutual Information
Authors Daniel Steinberg, Alistair Reid, Simon O’Callaghan, Finnian Lattimore, Lachlan McCalman, Tiberio Caetano
Abstract Most work in algorithmic fairness to date has focused on discrete outcomes, such as deciding whether to grant someone a loan or not. In these classification settings, group fairness criteria such as independence, separation and sufficiency can be measured directly by comparing rates of outcomes between subpopulations. Many important problems however require the prediction of a real-valued outcome, such as a risk score or insurance premium. In such regression settings, measuring group fairness criteria is computationally challenging, as it requires estimating information-theoretic divergences between conditional probability density functions. This paper introduces fast approximations of the independence, separation and sufficiency group fairness criteria for regression models from their (conditional) mutual information definitions, and uses such approximations as regularisers to enforce fairness within a regularised risk minimisation framework. Experiments in real-world datasets indicate that in spite of its superior computational efficiency our algorithm still displays state-of-the-art accuracy/fairness tradeoffs.
Published 2020-02-14
URL https://arxiv.org/abs/2002.06200v1
PDF https://arxiv.org/pdf/2002.06200v1.pdf
PWC https://paperswithcode.com/paper/fast-fair-regression-via-efficient

Optimization by Hybridization of a Genetic Algorithm with the PROMOTHEE Method: Management of Multicriteria Localization

Title Optimization by Hybridization of a Genetic Algorithm with the PROMOTHEE Method: Management of Multicriteria Localization
Authors Myriem Alijo, Otman Abdoun, Mostafa Bachran, Amal Bergam
Abstract The decision to locate an economic activity of one or several countries is made taking into account numerous parameters and criteria. Several studies have been carried out in this field, but they generally use information in a reduced context. The majority are based solely on parameters, using traditional methods which often lead to unsatisfactory solutions.This work consists in hybridizing through genetic algorithms, economic intelligence (EI) and multicriteria analysis methods (MCA) to improve the decisions of territorial localization. The purpose is to lead the company to locate its activity in the place that would allow it a competitive advantage. This work also consists of identifying all the parameters that can influence the decision of the economic actors and equipping them with tools using all the national and international data available to lead to a mapping of countries, regions or departments favorable to the location. Throughout our research, we have as a goal the realization of a hybrid conceptual model of economic intelligence based on multicriteria on with genetic algorithms in order to optimize the decisions of localization, in this perspective we opted for the method of PROMETHEE (Preference Ranking Organization for Method of Enrichment Evaluation), which has made it possible to obtain the best compromise between the various visions and various points of view.
Published 2020-01-10
URL https://arxiv.org/abs/2002.04068v1
PDF https://arxiv.org/pdf/2002.04068v1.pdf
PWC https://paperswithcode.com/paper/optimization-by-hybridization-of-a-genetic

Julia Language in Machine Learning: Algorithms, Applications, and Open Issues

Title Julia Language in Machine Learning: Algorithms, Applications, and Open Issues
Authors Kaifeng Gao, Jingzhi Tu, Zenan Huo, Gang Mei, Francesco Piccialli, Salvatore Cuomo
Abstract Machine learning is driving development across many fields in science and engineering. A simple and efficient programming language could accelerate applications of machine learning in various fields. Currently, the programming languages most commonly used to develop machine learning algorithms include Python, MATLAB, and C/C ++. However, none of these languages well balance both efficiency and simplicity. The Julia language is a fast, easy-to-use, and open-source programming language that was originally designed for high-performance computing, which can well balance the efficiency and simplicity. This paper summarizes the related research work and developments in the application of the Julia language in machine learning. It first surveys the popular machine learning algorithms that are developed in the Julia language. Then, it investigates applications of the machine learning algorithms implemented with the Julia language. Finally, it discusses the open issues and the potential future directions that arise in the use of the Julia language in machine learning.
Published 2020-03-23
URL https://arxiv.org/abs/2003.10146v1
PDF https://arxiv.org/pdf/2003.10146v1.pdf
PWC https://paperswithcode.com/paper/julia-language-in-machine-learning-algorithms

Single Image Dehazing Using Ranking Convolutional Neural Network

Title Single Image Dehazing Using Ranking Convolutional Neural Network
Authors Yafei Song, Jia Li, Xiaogang Wang, Xiaowu Chen
Abstract Single image dehazing, which aims to recover the clear image solely from an input hazy or foggy image, is a challenging ill-posed problem. Analysing existing approaches, the common key step is to estimate the haze density of each pixel. To this end, various approaches often heuristically designed haze-relevant features. Several recent works also automatically learn the features via directly exploiting Convolutional Neural Networks (CNN). However, it may be insufficient to fully capture the intrinsic attributes of hazy images. To obtain effective features for single image dehazing, this paper presents a novel Ranking Convolutional Neural Network (Ranking-CNN). In Ranking-CNN, a novel ranking layer is proposed to extend the structure of CNN so that the statistical and structural attributes of hazy images can be simultaneously captured. By training Ranking-CNN in a well-designed manner, powerful haze-relevant features can be automatically learned from massive hazy image patches. Based on these features, haze can be effectively removed by using a haze density prediction model trained through the random forest regression. Experimental results show that our approach outperforms several previous dehazing approaches on synthetic and real-world benchmark images. Comprehensive analyses are also conducted to interpret the proposed Ranking-CNN from both the theoretical and experimental aspects.
Tasks Image Dehazing, Single Image Dehazing
Published 2020-01-15
URL https://arxiv.org/abs/2001.05246v1
PDF https://arxiv.org/pdf/2001.05246v1.pdf
PWC https://paperswithcode.com/paper/single-image-dehazing-using-ranking

A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings

Title A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings
Authors Carlo Ciliberto, Lorenzo Rosasco, Alessandro Rudi
Abstract We propose and analyze a novel theoretical and algorithmic framework for structured prediction. While so far the term has referred to discrete output spaces, here we consider more general settings, such as manifolds or spaces of probability measures. We define structured prediction as a problem where the output space lacks a vectorial structure. We identify and study a large class of loss functions that implicitly defines a suitable geometry on the problem. The latter is the key to develop an algorithmic framework amenable to a sharp statistical analysis and yielding efficient computations. When dealing with output spaces with infinite cardinality, a suitable implicit formulation of the estimator is shown to be crucial.
Tasks Structured Prediction
Published 2020-02-13
URL https://arxiv.org/abs/2002.05424v1
PDF https://arxiv.org/pdf/2002.05424v1.pdf
PWC https://paperswithcode.com/paper/a-general-framework-for-consistent-structured

Driver Drowsiness Detection Model Using Convolutional Neural Networks Techniques for Android Application

Title Driver Drowsiness Detection Model Using Convolutional Neural Networks Techniques for Android Application
Authors Rateb Jabbar, Mohammed Shinoy, Mohamed Kharbeche, Khalifa Al-Khalifa, Moez Krichen, Kamel Barkaoui
Abstract A sleepy driver is arguably much more dangerous on the road than the one who is speeding as he is a victim of microsleeps. Automotive researchers and manufacturers are trying to curb this problem with several technological solutions that will avert such a crisis. This article focuses on the detection of such micro sleep and drowsiness using neural network based methodologies. Our previous work in this field involved using machine learning with multi-layer perceptron to detect the same. In this paper, accuracy was increased by utilizing facial landmarks which are detected by the camera and that is passed to a Convolutional Neural Network (CNN) to classify drowsiness. The achievement with this work is the capability to provide a lightweight alternative to heavier classification models with more than 88% for the category without glasses, more than 85% for the category night without glasses. On average, more than 83% of accuracy was achieved in all categories. Moreover, as for model size, complexity and storage, there is a marked reduction in the new proposed model in comparison to the benchmark model where the maximum size is 75 KB. The proposed CNN based model can be used to build a real-time driver drowsiness detection system for embedded systems and Android devices with high accuracy and ease of use.
Published 2020-01-17
URL https://arxiv.org/abs/2002.03728v1
PDF https://arxiv.org/pdf/2002.03728v1.pdf
PWC https://paperswithcode.com/paper/driver-drowsiness-detection-model-using

LIMITS: Lightweight Machine Learning for IoT Systems with Resource Limitations

Title LIMITS: Lightweight Machine Learning for IoT Systems with Resource Limitations
Authors Benjamin Sliwa, Nico Piatkowski, Christian Wietfeld
Abstract Exploiting big data knowledge on small devices will pave the way for building truly cognitive Internet of Things (IoT) systems. Although machine learning has led to great advancements for IoT-based data analytics, there remains a huge methodological gap for the deployment phase of trained machine learning models. For given resource-constrained platforms such as Microcontroller Units (MCUs), model choice and parametrization are typically performed based on heuristics or analytical models. However, these approaches are only able to provide rough estimates of the required system resources as they do not consider the interplay of hardware, compiler specific optimizations, and code dependencies. In this paper, we present the novel open source framework LIghtweight Machine learning for IoT Systems (LIMITS), which applies a platform-in-the-loop approach explicitly considering the actual compilation toolchain of the target IoT platform. LIMITS focuses on high level tasks such as experiment automation, platform-specific code generation, and sweet spot determination. The solid foundations of validated low-level model implementations are provided by the coupled well-established data analysis framework Waikato Environment for Knowledge Analysis (WEKA). We apply and validate LIMITS in two case studies focusing on cellular data rate prediction and radio-based vehicle classification, where we compare different learning models and real world IoT platforms with memory constraints from 16 kB to 4 MB and demonstrate its potential to catalyze the development of machine learning enabled IoT systems.
Tasks Code Generation
Published 2020-01-28
URL https://arxiv.org/abs/2001.10189v1
PDF https://arxiv.org/pdf/2001.10189v1.pdf
PWC https://paperswithcode.com/paper/limits-lightweight-machine-learning-for-iot

Region adaptive graph fourier transform for 3d point clouds

Title Region adaptive graph fourier transform for 3d point clouds
Authors Eduardo Pavez, Benjamin Girault, Antonio Ortega, Philip A. Chou
Abstract We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. We assume the points are organized by a family of nested partitions represented by a tree. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. At each resolution level, attributes are processed in clusters by a set of block transforms. Each block transform produces a single approximation (DC) coefficient, and various detail (AC) coefficients. The DC coefficients are promoted up the tree to the next (lower resolution) level, where the process can be repeated until reaching the root. Since clusters may have a different numbers of points, each block transform must incorporate the relative importance of each coefficient. For this, we introduce the $\mathbf{Q}$-normalized graph Laplacian, and propose using its eigenvectors as the block transform. The RA-GFT outperforms the Region Adaptive Haar Transform (RAHT) by up to 2.5 dB, with a small complexity overhead.
Published 2020-03-04
URL https://arxiv.org/abs/2003.01866v1
PDF https://arxiv.org/pdf/2003.01866v1.pdf
PWC https://paperswithcode.com/paper/region-adaptive-graph-fourier-transform-for
comments powered by Disqus