January 31, 2020

3305 words 16 mins read

Paper Group ANR 37

From Google Maps to a Fine-Grained Catalog of Street trees. Automated Activity Recognition of Construction Equipment Using a Data Fusion Approach. TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records. AutoML @ NeurIPS 2018 challenge: Design and Results. Parameter Estimation with the Ordered $\ell_{2}$ Regulariza …

From Google Maps to a Fine-Grained Catalog of Street trees


Title	From Google Maps to a Fine-Grained Catalog of Street trees
Authors	Steve Branson, Jan Dirk Wegner, David Hall, Nico Lang, Konrad Schindler, Pietro Perona
Abstract	Up-to-date catalogs of the urban tree population are important for municipalities to monitor and improve quality of life in cities. Despite much research on automation of tree mapping, mainly relying on dedicated airborne LiDAR or hyperspectral campaigns, trees are still mostly mapped manually in practice. We present a fully automated tree detection and species recognition pipeline to process thousands of trees within a few hours using publicly available aerial and street view images of Google MapsTM. These data provide rich information (viewpoints, scales) from global tree shapes to bark textures. Our work-flow is built around a supervised classification that automatically learns the most discriminative features from thousands of trees and corresponding, public tree inventory data. In addition, we introduce a change tracker to keep urban tree inventories up-to-date. Changes of individual trees are recognized at city-scale by comparing street-level images of the same tree location at two different times. Drawing on recent advances in computer vision and machine learning, we apply convolutional neural networks (CNN) for all classification tasks. We propose the following pipeline: download all available panoramas and overhead images of an area of interest, detect trees per image and combine multi-view detections in a probabilistic framework, adding prior knowledge; recognize fine-grained species of detected trees. In a later, separate module, track trees over time and identify the type of change. We believe this is the first work to exploit publicly available image data for fine-grained tree mapping at city-scale, respectively over many thousands of trees. Experiments in the city of Pasadena, California, USA show that we can detect > 70% of the street trees, assign correct species to > 80% for 40 different species, and correctly detect and classify changes in > 90% of the cases.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02675v1
PDF	https://arxiv.org/pdf/1910.02675v1.pdf
PWC	https://paperswithcode.com/paper/from-google-maps-to-a-fine-grained-catalog-of
Repo
Framework

Automated Activity Recognition of Construction Equipment Using a Data Fusion Approach


Title	Automated Activity Recognition of Construction Equipment Using a Data Fusion Approach
Authors	Behnam Sherafat, Abbas Rashidi, Yong-Cheol Lee, Changbum R. Ahn
Abstract	Automated monitoring of construction operations, especially operations of equipment and machines, is an essential step toward cost-estimating, and planning of construction projects. In recent years, a number of methods were suggested for recognizing activities of construction equipment. These methods are based on processing single types of data (audio, visual, or kinematic data). Considering the complexity of construction jobsites, using one source of data is not reliable enough to cover all conditions and scenarios. To address the issue, we utilized a data fusion approach: This approach is based on collecting audio and kinematic data, and includes the following steps: 1) recording audio and kinematic data generated by machines, 2) preprocessing data, 3) extracting time- and frequency-domain-features, 4) feature-fusion, and 5) categorizing activities using a machine-learning algorithm. The proposed approach was implemented on multiple machines and the experiments show that it is possible to get up to 25% more-accurate results compared to cases of using single-data-sources.
Tasks	Activity Recognition
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02070v1
PDF	https://arxiv.org/pdf/1906.02070v1.pdf
PWC	https://paperswithcode.com/paper/automated-activity-recognition-of
Repo
Framework

TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records


Title	TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records
Authors	Ardavan Afshar, Ioakeim Perros, Haesun Park, Christopher deFilippi, Xiaowei Yan, Walter Stewart, Joyce Ho, Jimeng Sun
Abstract	Phenotyping electronic health records (EHR) focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical reformulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14x faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 80 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05843v1
PDF	https://arxiv.org/pdf/1911.05843v1.pdf
PWC	https://paperswithcode.com/paper/taste-temporal-and-static-tensor
Repo
Framework

AutoML @ NeurIPS 2018 challenge: Design and Results


Title	AutoML @ NeurIPS 2018 challenge: Design and Results
Authors	Hugo Jair Escalante, Wei-Wei Tu, Isabelle Guyon, Daniel L. Silver, Evelyne Viegas, Yuqiang Chen, Wenyuan Dai, Qiang Yang
Abstract	We organized a competition on Autonomous Lifelong Machine Learning with Drift that was part of the competition program of NeurIPS 2018. This data driven competition asked participants to develop computer programs capable of solving supervised learning problems where the i.i.d. assumption did not hold. Large data sets were arranged in a lifelong learning and evaluation scenario and CodaLab was used as the challenge platform. The challenge attracted more than 300 participants in its two month duration. This chapter describes the design of the challenge and summarizes its main results.
Tasks	AutoML
Published	2019-03-12
URL	http://arxiv.org/abs/1903.05263v2
PDF	http://arxiv.org/pdf/1903.05263v2.pdf
PWC	https://paperswithcode.com/paper/automl-neurips-2018-challenge-design-and
Repo
Framework

Parameter Estimation with the Ordered $\ell_{2}$ Regularization via an Alternating Direction Method of Multipliers


Title	Parameter Estimation with the Ordered $\ell_{2}$ Regularization via an Alternating Direction Method of Multipliers
Authors	Mahammad Humayoo, Xueqi Cheng
Abstract	Regularization is a popular technique in machine learning for model estimation and avoiding overfitting. Prior studies have found that modern ordered regularization can be more effective in handling highly correlated, high-dimensional data than traditional regularization. The reason stems from the fact that the ordered regularization can reject irrelevant variables and yield an accurate estimation of the parameters. How to scale up the ordered regularization problems when facing the large-scale training data remains an unanswered question. This paper explores the problem of parameter estimation with the ordered $\ell_{2}$-regularization via Alternating Direction Method of Multipliers (ADMM), called ADMM-O$\ell_{2}$. The advantages of ADMM-O$\ell_{2}$ include (i) scaling up the ordered $\ell_{2}$ to a large-scale dataset, (ii) predicting parameters correctly by excluding irrelevant variables automatically, and (iii) having a fast convergence rate. Experiment results on both synthetic data and real data indicate that ADMM-O$\ell_{2}$ can perform better than or comparable to several state-of-the-art baselines.
Tasks
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01519v3
PDF	https://arxiv.org/pdf/1909.01519v3.pdf
PWC	https://paperswithcode.com/paper/parameter-estimation-with-the-ordered-ell_2
Repo
Framework

U-CAM: Visual Explanation using Uncertainty based Class Activation Maps


Title	U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Authors	Badri N. Patro, Mayank Lunayach, Shivansh Patel, Vinay P. Namboodiri
Abstract	Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanation for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.
Tasks	Question Answering, Visual Question Answering
Published	2019-08-17
URL	https://arxiv.org/abs/1908.06306v4
PDF	https://arxiv.org/pdf/1908.06306v4.pdf
PWC	https://paperswithcode.com/paper/u-cam-visual-explanation-using-uncertainty
Repo
Framework

Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation


Title	Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation
Authors	Ran Tian, Shashi Narayan, Thibault Sellam, Ankur P. Parikh
Abstract	Neural conditional text generation systems have achieved significant progress in recent years, showing the ability to produce highly fluent text. However, the inherent lack of controllability in these systems allows them to hallucinate factually incorrect phrases that are unfaithful to the source, making them often unsuitable for many real world systems that require high degrees of precision. In this work, we propose a novel confidence oriented decoder that assigns a confidence score to each target position. This score is learned in training using a variational Bayes objective, and can be leveraged at inference time using a calibration technique to promote more faithful generation. Experiments on a structured data-to-text dataset – WikiBio – show that our approach is more faithful to the source than existing state-of-the-art approaches, according to both automatic metrics and human evaluation.
Tasks	Calibration, Data-to-Text Generation, Text Generation
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08684v2
PDF	https://arxiv.org/pdf/1910.08684v2.pdf
PWC	https://paperswithcode.com/paper/sticking-to-the-facts-confident-decoding-for
Repo
Framework

Multimodal Unified Attention Networks for Vision-and-Language Interactions


Title	Multimodal Unified Attention Networks for Vision-and-Language Interactions
Authors	Zhou Yu, Yuhao Cui, Jun Yu, Dacheng Tao, Qi Tian
Abstract	Learning an effective attention mechanism for multimodal data is important in many vision-and-language tasks that require a synergic understanding of both the visual and textual contents. Existing state-of-the-art approaches use co-attention models to associate each visual object (e.g., image region) with each textual object (e.g., query word). Despite the success of these co-attention models, they only model inter-modal interactions while neglecting intra-modal interactions. Here we propose a general `unified attention’ model that simultaneously captures the intra- and inter-modal interactions of multimodal features and outputs their corresponding attended representations. By stacking such unified attention blocks in depth, we obtain the deep Multimodal Unified Attention Network (MUAN), which can seamlessly be applied to the visual question answering (VQA) and visual grounding tasks. We evaluate our MUAN models on two VQA datasets and three visual grounding datasets, and the results show that MUAN achieves top-level performance on both tasks without bells and whistles. \|
Tasks	Question Answering, Visual Question Answering
Published	2019-08-12
URL	https://arxiv.org/abs/1908.04107v2
PDF	https://arxiv.org/pdf/1908.04107v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-unified-attention-networks-for
Repo
Framework

Robust and Communication-Efficient Collaborative Learning


Title	Robust and Communication-Efficient Collaborative Learning
Authors	Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani
Abstract	We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers’ delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm named as QuanTimed-DSGD. Our algorithm stands on two main ideas: (i) we impose a deadline on the local gradient computations of each node at each iteration of the algorithm, and (ii) the nodes exchange quantized versions of their local models. The first idea robustifies to straggling nodes and the second alleviates communication efficiency. The key technical contribution of our work is to prove that with non-vanishing noises for quantization and stochastic gradients, the proposed method exactly converges to the global optimal for convex loss functions, and finds a first-order stationary point in non-convex scenarios. Our numerical evaluations of the QuanTimed-DSGD on training benchmark datasets, MNIST and CIFAR-10, demonstrate speedups of up to 3x in run-time, compared to state-of-the-art decentralized optimization methods.
Tasks	Quantization
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10595v2
PDF	https://arxiv.org/pdf/1907.10595v2.pdf
PWC	https://paperswithcode.com/paper/robust-and-communication-efficient
Repo
Framework

A Machine Learning Approach for Smartphone-based Sensing of Roads and Driving Style


Title	A Machine Learning Approach for Smartphone-based Sensing of Roads and Driving Style
Authors	M. Ricardo Carlos
Abstract	Road transportation is of critical importance for a nation, having profound effects in the economy, the health and life style of its people. With the growth of cities and populations come bigger demands for mobility and safety, creating new problems and magnifying those of the past. New tools are needed to face the challenge, to keep roads in good conditions, their users safe, and minimize the impact on the environment. This dissertation is concerned with road quality assessment and aggressive driving, two important problems in road transportation, approached in the context of Intelligent Transportation Systems by using Machine Learning techniques to analyze acceleration time series acquired with smartphone-based opportunistic sensing to automatically detect, classify, and characterize events of interest. Two aspects of road quality assessment are addressed: the detection and the characterization of road anomalies. For the first, the most widely cited works in the literature are compared and proposals capable of equal or better performance are presented, removing the reliance on threshold values and reducing the computational cost and dimensionality of previous proposals. For the second, new approaches for the estimation of pothole depth and the functional condition of speed reducers are showed. The new problem of pothole depth ranking is introduced, using a learning-to-rank approach to sort acceleration signals by the depth of the potholes that they reflect. The classification of aggressive driving maneuvers is done with automatic feature extraction, finding characteristically shaped subsequences in the signals as more effective discriminants than conventional descriptors calculated over time windows. Finally, all the previously mentioned tasks are combined to produce a robust road transport evaluation platform.
Tasks	Learning-To-Rank, Time Series
Published	2019-08-15
URL	https://arxiv.org/abs/1908.10187v1
PDF	https://arxiv.org/pdf/1908.10187v1.pdf
PWC	https://paperswithcode.com/paper/a-machine-learning-approach-for-smartphone
Repo
Framework

Automated speech-based screening of depression using deep convolutional neural networks


Title	Automated speech-based screening of depression using deep convolutional neural networks
Authors	Karol Chlasta, Krzysztof Wołk, Izabela Krejtz
Abstract	Early detection and treatment of depression is essential in promoting remission, preventing relapse, and reducing the emotional burden of the disease. Current diagnoses are primarily subjective, inconsistent across professionals, and expensive for individuals who may be in urgent need of help. This paper proposes a novel approach to automated depression detection in speech using convolutional neural network (CNN) and multipart interactive training. The model was tested using 2568 voice samples obtained from 77 non-depressed and 30 depressed individuals. In experiment conducted, data were applied to residual CNNs in the form of spectrograms, images auto-generated from audio samples. The experimental results obtained using different ResNet architectures gave a promising baseline accuracy reaching 77%.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01115v1
PDF	https://arxiv.org/pdf/1912.01115v1.pdf
PWC	https://paperswithcode.com/paper/automated-speech-based-screening-of
Repo
Framework

Feature-level and Model-level Audiovisual Fusion for Emotion Recognition in the Wild


Title	Feature-level and Model-level Audiovisual Fusion for Emotion Recognition in the Wild
Authors	Jie Cai, Zibo Meng, Ahmed Shehab Khan, Zhiyuan Li, James O’Reilly, Shizhong Han, Ping Liu, Min Chen, Yan Tong
Abstract	Emotion recognition plays an important role in human-computer interaction (HCI) and has been extensively studied for decades. Although tremendous improvements have been achieved for posed expressions, recognizing human emotions in “close-to-real-world” environments remains a challenge. In this paper, we proposed two strategies to fuse information extracted from different modalities, i.e., audio and visual. Specifically, we utilized LBP-TOP, an ensemble of CNNs, and a bi-directional LSTM (BLSTM) to extract features from the visual channel, and the OpenSmile toolkit to extract features from the audio channel. Two kinds of fusion methods, i,e., feature-level fusion and model-level fusion, were developed to utilize the information extracted from the two channels. Experimental results on the EmotiW2018 AFEW dataset have shown that the proposed fusion methods outperform the baseline methods significantly and achieve better or at least comparable performance compared with the state-of-the-art methods, where the model-level fusion performs better when one of the channels totally fails.
Tasks	Emotion Recognition
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02728v1
PDF	https://arxiv.org/pdf/1906.02728v1.pdf
PWC	https://paperswithcode.com/paper/feature-level-and-model-level-audiovisual
Repo
Framework

Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs


Title	Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs
Authors	Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, Hiroyuki Toda
Abstract	We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities. Here, the areal data are observed not at location points but at regions. Existing regression-based models can only utilize the sufficiently fine-grained auxiliary data sets on the same domain (e.g., a city). With the proposed model, the functions for respective areal data sets are assumed to be a multivariate dependent Gaussian process (GP) that is modeled as a linear mixing of independent latent GPs. Sharing of latent GPs across multiple areal data sets allows us to effectively estimate the spatial correlation for each areal data set; moreover it can easily be extended to transfer learning across multiple domains. To handle the multivariate areal data, we design an observation model with a spatial aggregation process for each areal data set, which is an integral of the mixed GP over the corresponding region. By deriving the posterior GP, we can predict the data value at any location point by considering the spatial correlations and the dependences between areal data sets, simultaneously. Our experiments on real-world data sets demonstrate that our model can 1) accurately refine coarse-grained areal data, and 2) offer performance improvements by using the areal data sets from multiple domains.
Tasks	Gaussian Processes, Transfer Learning
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08350v2
PDF	https://arxiv.org/pdf/1907.08350v2.pdf
PWC	https://paperswithcode.com/paper/spatially-aggregated-gaussian-processes-with
Repo
Framework

Spectral Reconstruction with Deep Neural Networks


Title	Spectral Reconstruction with Deep Neural Networks
Authors	Lukas Kades, Jan M. Pawlowski, Alexander Rothkopf, Manuel Scherzer, Julian M. Urban, Sebastian J. Wetzel, Nicolas Wink, Felix Ziegler
Abstract	We explore artificial neural networks as a tool for the reconstruction of spectral functions from imaginary time Green’s functions, a classic ill-conditioned inverse problem. Our ansatz is based on a supervised learning framework in which prior knowledge is encoded in the training data and the inverse transformation manifold is explicitly parametrised through a neural network. We systematically investigate this novel reconstruction approach, providing a detailed analysis of its performance on physically motivated mock data, and compare it to established methods of Bayesian inference. The reconstruction accuracy is found to be at least comparable, and potentially superior in particular at larger noise levels. We argue that the use of labelled training data in a supervised setting and the freedom in defining an optimisation objective are inherent advantages of the present approach and may lead to significant improvements over state-of-the-art methods in the future. Potential directions for further research are discussed in detail.
Tasks	Bayesian Inference
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04305v1
PDF	https://arxiv.org/pdf/1905.04305v1.pdf
PWC	https://paperswithcode.com/paper/spectral-reconstruction-with-deep-neural
Repo
Framework

Reconstruction of Gene Regulatory Networks usingMultiple Datasets


Title	Reconstruction of Gene Regulatory Networks usingMultiple Datasets
Authors	Mehrzad Saremi, Maryam Amirmazlaghani
Abstract	Motivation: Laboratory gene regulatory data for a species are sporadic. Despite the abundance of gene regulatory network algorithms that employ single data sets, few algorithms can combine the vast but disperse sources of data and extract the potential information. With a motivation to compensate for this shortage, we developed an algorithm called GENEREF that can accumulate information from multiple types of data sets in an iterative manner, with each iteration boosting the performance of the prediction results. Results: The algorithm is examined extensively on data extracted from the quintuple DREAM4 networks. Many single-dataset algorithms and one multi-dataset algorithm were compared to test the performance of the algorithm. Results show that GENEREF surpasses non-ensemble state-of-the-art multi-perturbation algorithms on the selected networks and is competitive to present multiple-dataset algorithms. Specifically, it outperforms dynGENIE3 and is on par with iRafNet. Also, we argued that a scoring method solely based on the AUPR criterion would be more trustworthy than the traditional score. Availability: The Python implementation along with the data sets and results can be downloaded from \url{github.com/msaremi/GENEREF}
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.10810v1
PDF	https://arxiv.org/pdf/1912.10810v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-of-gene-regulatory-networks
Repo
Framework