April 2, 2020

3533 words 17 mins read

Paper Group ANR 273

Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders. The value of text for small business default prediction: A deep learning approach. Counting dense objects in remote sensing images. Spatio-Temporal Action Detection with Multi-Object Interaction. Table Structure Extraction w …

Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders


Title	Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders
Authors	Shomik Jain, Balasubramanian Thiagarajan, Zhonghao Shi, Caitlyn Clabaugh, Maja J. Matarić
Abstract	Socially assistive robotics (SAR) has great potential to provide accessible, affordable, and personalized therapeutic interventions for children with autism spectrum disorders (ASD). However, human-robot interaction (HRI) methods are still limited in their ability to autonomously recognize and respond to behavioral cues, especially in atypical users and everyday settings. This work applies supervised machine learning algorithms to model user engagement in the context of long-term, in-home SAR interventions for children with ASD. Specifically, two types of engagement models are presented for each user: 1) generalized models trained on data from different users; and 2) individualized models trained on an early subset of the user’s data. The models achieved approximately 90% accuracy (AUROC) for post hoc binary classification of engagement, despite the high variance in data observed across users, sessions, and engagement states. Moreover, temporal patterns in model predictions could be used to reliably initiate re-engagement actions at appropriate times. These results validate the feasibility and challenges of recognition and response to user disengagement in long-term, real-world HRI settings. The contributions of this work also inform the design of engaging and personalized HRI, especially for the ASD community.
Tasks
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02453v1
PDF	https://arxiv.org/pdf/2002.02453v1.pdf
PWC	https://paperswithcode.com/paper/modeling-engagement-in-long-term-in-home
Repo
Framework

The value of text for small business default prediction: A deep learning approach


Title	The value of text for small business default prediction: A deep learning approach
Authors	Matthew Stevenson, Christophe Mues, Cristián Bravo
Abstract	Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. To mitigate limited data availability, it is standard policy for a loan officer to provide a textual loan assessment. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60000+ textual assessments. We consider the performance in terms of AUC (Area Under the Curve) and Balanced Accuracy and find that the text alone is surprisingly effective for predicting default. Yet, when combined with traditional data, it yields no additional predictive capability. We do find, however, that deep learning with categorical embeddings is capable of producing a modest performance improvement when compared to alternative machine learning models. We explore how the loan assessments influence predictions, explaining why despite the text being predictive, no additional performance is gained. This exploration leads us to a series of recommendations on a new strategy for the collection of future mSME loan assessments.
Tasks
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08964v1
PDF	https://arxiv.org/pdf/2003.08964v1.pdf
PWC	https://paperswithcode.com/paper/the-value-of-text-for-small-business-default
Repo
Framework

Counting dense objects in remote sensing images


Title	Counting dense objects in remote sensing images
Authors	Guangshuai Gao, Qingjie Liu, Yunhong Wang
Abstract	Estimating accurate number of interested objects from a given image is a challenging yet important task. Significant efforts have been made to address this problem and achieve great progress, yet counting number of ground objects from remote sensing images is barely studied. In this paper, we are interested in counting dense objects from remote sensing images. Compared with object counting in natural scene, this task is challenging in following factors: large scale variation, complex cluttered background and orientation arbitrariness. More importantly, the scarcity of data severely limits the development of research in this field. To address these issues, we first construct a large-scale object counting dataset based on remote sensing images, which contains four kinds of objects: buildings, crowded ships in harbor, large-vehicles and small-vehicles in parking lot. We then benchmark the dataset by designing a novel neural network which can generate density map of an input image. The proposed network consists of three parts namely convolution block attention module (CBAM), scale pyramid module (SPM) and deformable convolution module (DCM). Experiments on the proposed dataset and comparisons with state of the art methods demonstrate the challenging of the proposed dataset, and superiority and effectiveness of our method.
Tasks	Object Counting
Published	2020-02-14
URL	https://arxiv.org/abs/2002.05928v1
PDF	https://arxiv.org/pdf/2002.05928v1.pdf
PWC	https://paperswithcode.com/paper/counting-dense-objects-in-remote-sensing
Repo
Framework

Spatio-Temporal Action Detection with Multi-Object Interaction


Title	Spatio-Temporal Action Detection with Multi-Object Interaction
Authors	Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell
Abstract	Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an “action tube”. Nowadays, most spatio-temporal action detection datasets (e.g. UCF101-24, AVA, DALY) are annotated with action tubes that contain a single person performing the action, thus the predominant action detection models simply employ a person detection and tracking pipeline for localization. However, when the action is defined as an interaction between multiple objects, such methods may fail since each bounding box in the action tube contains multiple objects instead of one person. In this paper, we study the spatio-temporal action detection problem with multi-object interaction. We introduce a new dataset that is annotated with action tubes containing multi-object interactions. Moreover, we propose an end-to-end spatio-temporal action detection model that performs both spatial and temporal regression simultaneously. Our spatial regression may enclose multiple objects participating in the action. During test time, we simply connect the regressed bounding boxes within the predicted temporal duration using a simple heuristic. We report the baseline results of our proposed model on this new dataset, and also show competitive results on the standard benchmark UCF101-24 using only RGB input.
Tasks	Action Detection, Human Detection
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00180v1
PDF	https://arxiv.org/pdf/2004.00180v1.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-action-detection-with-multi
Repo
Framework

Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks


Title	Table Structure Extraction with Bi-directional Gated Recurrent Unit Networks
Authors	Saqib Ali Khan, Syed Muhammad Daniyal Khalid, Muhammad Ali Shahzad, Faisal Shafait
Abstract	Tables present summarized and structured information to the reader, which makes table structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with soft max activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-the-art table structure extraction systems by a significant margin.
Tasks	Optical Character Recognition
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02501v1
PDF	https://arxiv.org/pdf/2001.02501v1.pdf
PWC	https://paperswithcode.com/paper/table-structure-extraction-with-bi
Repo
Framework

Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data


Title	Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data
Authors	Xi Yan, David Acuna, Sanja Fidler
Abstract	Transfer learning has proven to be a successful technique to train deep learning models in the domains where little training data is available. The dominant approach is to pretrain a model on a large generic dataset such as ImageNet and finetune its weights on the target domain. However, in the new era of an ever-increasing number of massive datasets, selecting the relevant data for pretraining is a critical issue. We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client, an end-user with a target application with its own small labeled dataset. The dataserver represents large datasets with a much more compact mixture-of-experts model, and employs it to perform data search in a series of dataserver-client transactions at a low computational cost. We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets and tasks such as image classification, object detection and instance segmentation. Neural Data Server is available as a web-service at http://aidemos.cs.toronto.edu/nds/.
Tasks	Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning
Published	2020-01-09
URL	https://arxiv.org/abs/2001.02799v3
PDF	https://arxiv.org/pdf/2001.02799v3.pdf
PWC	https://paperswithcode.com/paper/neural-data-server-a-large-scale-search
Repo
Framework

Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)


Title	Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)
Authors	Jamshed Memon, Maira Sami, Rizwan Ahmed Khan
Abstract	Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artificial intelligence / machine learning tools to automatically analyze handwritten and printed documents in order to convert them into electronic format. The objective of this review paper is to summarize research that has been conducted on character recognition of handwritten documents and to provide research directions. In this Systematic Literature Review (SLR) we collected, synthesized and analyzed research articles on the topic of handwritten OCR (and closely related topics) which were published between year 2000 to 2018. We followed widely used electronic databases by following pre-defined review protocol. Articles were searched using keywords, forward reference searching and backward reference searching in order to search all the articles related to the topic. After carefully following study selection process 142 articles were selected for this SLR. This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.
Tasks	Optical Character Recognition
Published	2020-01-01
URL	https://arxiv.org/abs/2001.00139v1
PDF	https://arxiv.org/pdf/2001.00139v1.pdf
PWC	https://paperswithcode.com/paper/handwritten-optical-character-recognition-ocr
Repo
Framework

Fast Fair Regression via Efficient Approximations of Mutual Information


Title	Fast Fair Regression via Efficient Approximations of Mutual Information
Authors	Daniel Steinberg, Alistair Reid, Simon O’Callaghan, Finnian Lattimore, Lachlan McCalman, Tiberio Caetano
Abstract	Most work in algorithmic fairness to date has focused on discrete outcomes, such as deciding whether to grant someone a loan or not. In these classification settings, group fairness criteria such as independence, separation and sufficiency can be measured directly by comparing rates of outcomes between subpopulations. Many important problems however require the prediction of a real-valued outcome, such as a risk score or insurance premium. In such regression settings, measuring group fairness criteria is computationally challenging, as it requires estimating information-theoretic divergences between conditional probability density functions. This paper introduces fast approximations of the independence, separation and sufficiency group fairness criteria for regression models from their (conditional) mutual information definitions, and uses such approximations as regularisers to enforce fairness within a regularised risk minimisation framework. Experiments in real-world datasets indicate that in spite of its superior computational efficiency our algorithm still displays state-of-the-art accuracy/fairness tradeoffs.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06200v1
PDF	https://arxiv.org/pdf/2002.06200v1.pdf
PWC	https://paperswithcode.com/paper/fast-fair-regression-via-efficient
Repo
Framework

Optimization by Hybridization of a Genetic Algorithm with the PROMOTHEE Method: Management of Multicriteria Localization


Title	Optimization by Hybridization of a Genetic Algorithm with the PROMOTHEE Method: Management of Multicriteria Localization
Authors	Myriem Alijo, Otman Abdoun, Mostafa Bachran, Amal Bergam
Abstract	The decision to locate an economic activity of one or several countries is made taking into account numerous parameters and criteria. Several studies have been carried out in this field, but they generally use information in a reduced context. The majority are based solely on parameters, using traditional methods which often lead to unsatisfactory solutions.This work consists in hybridizing through genetic algorithms, economic intelligence (EI) and multicriteria analysis methods (MCA) to improve the decisions of territorial localization. The purpose is to lead the company to locate its activity in the place that would allow it a competitive advantage. This work also consists of identifying all the parameters that can influence the decision of the economic actors and equipping them with tools using all the national and international data available to lead to a mapping of countries, regions or departments favorable to the location. Throughout our research, we have as a goal the realization of a hybrid conceptual model of economic intelligence based on multicriteria on with genetic algorithms in order to optimize the decisions of localization, in this perspective we opted for the method of PROMETHEE (Preference Ranking Organization for Method of Enrichment Evaluation), which has made it possible to obtain the best compromise between the various visions and various points of view.
Tasks
Published	2020-01-10
URL	https://arxiv.org/abs/2002.04068v1
PDF	https://arxiv.org/pdf/2002.04068v1.pdf
PWC	https://paperswithcode.com/paper/optimization-by-hybridization-of-a-genetic
Repo
Framework

Julia Language in Machine Learning: Algorithms, Applications, and Open Issues


Title	Julia Language in Machine Learning: Algorithms, Applications, and Open Issues
Authors	Kaifeng Gao, Jingzhi Tu, Zenan Huo, Gang Mei, Francesco Piccialli, Salvatore Cuomo
Abstract	Machine learning is driving development across many fields in science and engineering. A simple and efficient programming language could accelerate applications of machine learning in various fields. Currently, the programming languages most commonly used to develop machine learning algorithms include Python, MATLAB, and C/C ++. However, none of these languages well balance both efficiency and simplicity. The Julia language is a fast, easy-to-use, and open-source programming language that was originally designed for high-performance computing, which can well balance the efficiency and simplicity. This paper summarizes the related research work and developments in the application of the Julia language in machine learning. It first surveys the popular machine learning algorithms that are developed in the Julia language. Then, it investigates applications of the machine learning algorithms implemented with the Julia language. Finally, it discusses the open issues and the potential future directions that arise in the use of the Julia language in machine learning.
Tasks
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10146v1
PDF	https://arxiv.org/pdf/2003.10146v1.pdf
PWC	https://paperswithcode.com/paper/julia-language-in-machine-learning-algorithms
Repo
Framework

Single Image Dehazing Using Ranking Convolutional Neural Network


Title	Single Image Dehazing Using Ranking Convolutional Neural Network
Authors	Yafei Song, Jia Li, Xiaogang Wang, Xiaowu Chen
Abstract	Single image dehazing, which aims to recover the clear image solely from an input hazy or foggy image, is a challenging ill-posed problem. Analysing existing approaches, the common key step is to estimate the haze density of each pixel. To this end, various approaches often heuristically designed haze-relevant features. Several recent works also automatically learn the features via directly exploiting Convolutional Neural Networks (CNN). However, it may be insufficient to fully capture the intrinsic attributes of hazy images. To obtain effective features for single image dehazing, this paper presents a novel Ranking Convolutional Neural Network (Ranking-CNN). In Ranking-CNN, a novel ranking layer is proposed to extend the structure of CNN so that the statistical and structural attributes of hazy images can be simultaneously captured. By training Ranking-CNN in a well-designed manner, powerful haze-relevant features can be automatically learned from massive hazy image patches. Based on these features, haze can be effectively removed by using a haze density prediction model trained through the random forest regression. Experimental results show that our approach outperforms several previous dehazing approaches on synthetic and real-world benchmark images. Comprehensive analyses are also conducted to interpret the proposed Ranking-CNN from both the theoretical and experimental aspects.
Tasks	Image Dehazing, Single Image Dehazing
Published	2020-01-15
URL	https://arxiv.org/abs/2001.05246v1
PDF	https://arxiv.org/pdf/2001.05246v1.pdf
PWC	https://paperswithcode.com/paper/single-image-dehazing-using-ranking
Repo
Framework

A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings


Title	A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings
Authors	Carlo Ciliberto, Lorenzo Rosasco, Alessandro Rudi
Abstract	We propose and analyze a novel theoretical and algorithmic framework for structured prediction. While so far the term has referred to discrete output spaces, here we consider more general settings, such as manifolds or spaces of probability measures. We define structured prediction as a problem where the output space lacks a vectorial structure. We identify and study a large class of loss functions that implicitly defines a suitable geometry on the problem. The latter is the key to develop an algorithmic framework amenable to a sharp statistical analysis and yielding efficient computations. When dealing with output spaces with infinite cardinality, a suitable implicit formulation of the estimator is shown to be crucial.
Tasks	Structured Prediction
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05424v1
PDF	https://arxiv.org/pdf/2002.05424v1.pdf
PWC	https://paperswithcode.com/paper/a-general-framework-for-consistent-structured
Repo
Framework

Driver Drowsiness Detection Model Using Convolutional Neural Networks Techniques for Android Application


Title	Driver Drowsiness Detection Model Using Convolutional Neural Networks Techniques for Android Application
Authors	Rateb Jabbar, Mohammed Shinoy, Mohamed Kharbeche, Khalifa Al-Khalifa, Moez Krichen, Kamel Barkaoui
Abstract	A sleepy driver is arguably much more dangerous on the road than the one who is speeding as he is a victim of microsleeps. Automotive researchers and manufacturers are trying to curb this problem with several technological solutions that will avert such a crisis. This article focuses on the detection of such micro sleep and drowsiness using neural network based methodologies. Our previous work in this field involved using machine learning with multi-layer perceptron to detect the same. In this paper, accuracy was increased by utilizing facial landmarks which are detected by the camera and that is passed to a Convolutional Neural Network (CNN) to classify drowsiness. The achievement with this work is the capability to provide a lightweight alternative to heavier classification models with more than 88% for the category without glasses, more than 85% for the category night without glasses. On average, more than 83% of accuracy was achieved in all categories. Moreover, as for model size, complexity and storage, there is a marked reduction in the new proposed model in comparison to the benchmark model where the maximum size is 75 KB. The proposed CNN based model can be used to build a real-time driver drowsiness detection system for embedded systems and Android devices with high accuracy and ease of use.
Tasks
Published	2020-01-17
URL	https://arxiv.org/abs/2002.03728v1
PDF	https://arxiv.org/pdf/2002.03728v1.pdf
PWC	https://paperswithcode.com/paper/driver-drowsiness-detection-model-using
Repo
Framework

LIMITS: Lightweight Machine Learning for IoT Systems with Resource Limitations


Title	LIMITS: Lightweight Machine Learning for IoT Systems with Resource Limitations
Authors	Benjamin Sliwa, Nico Piatkowski, Christian Wietfeld
Abstract	Exploiting big data knowledge on small devices will pave the way for building truly cognitive Internet of Things (IoT) systems. Although machine learning has led to great advancements for IoT-based data analytics, there remains a huge methodological gap for the deployment phase of trained machine learning models. For given resource-constrained platforms such as Microcontroller Units (MCUs), model choice and parametrization are typically performed based on heuristics or analytical models. However, these approaches are only able to provide rough estimates of the required system resources as they do not consider the interplay of hardware, compiler specific optimizations, and code dependencies. In this paper, we present the novel open source framework LIghtweight Machine learning for IoT Systems (LIMITS), which applies a platform-in-the-loop approach explicitly considering the actual compilation toolchain of the target IoT platform. LIMITS focuses on high level tasks such as experiment automation, platform-specific code generation, and sweet spot determination. The solid foundations of validated low-level model implementations are provided by the coupled well-established data analysis framework Waikato Environment for Knowledge Analysis (WEKA). We apply and validate LIMITS in two case studies focusing on cellular data rate prediction and radio-based vehicle classification, where we compare different learning models and real world IoT platforms with memory constraints from 16 kB to 4 MB and demonstrate its potential to catalyze the development of machine learning enabled IoT systems.
Tasks	Code Generation
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10189v1
PDF	https://arxiv.org/pdf/2001.10189v1.pdf
PWC	https://paperswithcode.com/paper/limits-lightweight-machine-learning-for-iot
Repo
Framework

Region adaptive graph fourier transform for 3d point clouds


Title	Region adaptive graph fourier transform for 3d point clouds
Authors	Eduardo Pavez, Benjamin Girault, Antonio Ortega, Philip A. Chou
Abstract	We introduce the Region Adaptive Graph Fourier Transform (RA-GFT) for compression of 3D point cloud attributes. We assume the points are organized by a family of nested partitions represented by a tree. The RA-GFT is a multiresolution transform, formed by combining spatially localized block transforms. At each resolution level, attributes are processed in clusters by a set of block transforms. Each block transform produces a single approximation (DC) coefficient, and various detail (AC) coefficients. The DC coefficients are promoted up the tree to the next (lower resolution) level, where the process can be repeated until reaching the root. Since clusters may have a different numbers of points, each block transform must incorporate the relative importance of each coefficient. For this, we introduce the $\mathbf{Q}$-normalized graph Laplacian, and propose using its eigenvectors as the block transform. The RA-GFT outperforms the Region Adaptive Haar Transform (RAHT) by up to 2.5 dB, with a small complexity overhead.
Tasks
Published	2020-03-04
URL	https://arxiv.org/abs/2003.01866v1
PDF	https://arxiv.org/pdf/2003.01866v1.pdf
PWC	https://paperswithcode.com/paper/region-adaptive-graph-fourier-transform-for
Repo
Framework