Paper Group ANR 1720
Democratisation of Usable Machine Learning in Computer Vision. A Sensitivity Analysis of Attention-Gated Convolutional Neural Networks for Sentence Classification. Semantic Segmentation from Remote Sensor Data and the Exploitation of Latent Learning for Classification of Auxiliary Tasks. Leveraging Machine Learning and Big Data for Smart Buildings: …
Democratisation of Usable Machine Learning in Computer Vision
Title | Democratisation of Usable Machine Learning in Computer Vision |
Authors | Raymond Bond, Ansgar Koene, Alan Dix, Jennifer Boger, Maurice D. Mulvenna, Mykola Galushka, Bethany Waterhouse Bradley, Fiona Browne, Hui Wang, Alexander Wong |
Abstract | Many industries are now investing heavily in data science and automation to replace manual tasks and/or to help with decision making, especially in the realm of leveraging computer vision to automate many monitoring, inspection, and surveillance tasks. This has resulted in the emergence of the ‘data scientist’ who is conversant in statistical thinking, machine learning (ML), computer vision, and computer programming. However, as ML becomes more accessible to the general public and more aspects of ML become automated, applications leveraging computer vision are increasingly being created by non-experts with less opportunity for regulatory oversight. This points to the overall need for more educated responsibility for these lay-users of usable ML tools in order to mitigate potentially unethical ramifications. In this paper, we undertake a SWOT analysis to study the strengths, weaknesses, opportunities, and threats of building usable ML tools for mass adoption for important areas leveraging ML such as computer vision. The paper proposes a set of data science literacy criteria for educating and supporting lay-users in the responsible development and deployment of ML applications. |
Tasks | Decision Making |
Published | 2019-02-18 |
URL | http://arxiv.org/abs/1902.06804v1 |
http://arxiv.org/pdf/1902.06804v1.pdf | |
PWC | https://paperswithcode.com/paper/democratisation-of-usable-machine-learning-in |
Repo | |
Framework | |
A Sensitivity Analysis of Attention-Gated Convolutional Neural Networks for Sentence Classification
Title | A Sensitivity Analysis of Attention-Gated Convolutional Neural Networks for Sentence Classification |
Authors | Yang Liu, Jianpeng Zhang, Chao Gao, Jinghua Qu, Lixin Ji |
Abstract | In this paper, we investigate the effect of different hyperparameters as well as different combinations of hyperparameters settings on the performance of the Attention-Gated Convolutional Neural Networks (AGCNNs), e.g., the kernel window size, the number of feature maps, the keep rate of the dropout layer, and the activation function. We draw practical advice from a wide range of empirical results. Through the sensitivity analysis, we further improve the hyperparameters settings of AGCNNs. Experiments show that our proposals could achieve an average of 0.81% and 0.67% improvements on AGCNN-NLReLU-rand and AGCNN-SELU-rand, respectively; and an average of 0.47% and 0.45% improvements on AGCNN-NLReLU-static and AGCNN-SELU-static, respectively. |
Tasks | Sentence Classification |
Published | 2019-08-17 |
URL | https://arxiv.org/abs/1908.06263v3 |
https://arxiv.org/pdf/1908.06263v3.pdf | |
PWC | https://paperswithcode.com/paper/a-sensitivity-analysis-of-attention-gated |
Repo | |
Framework | |
Semantic Segmentation from Remote Sensor Data and the Exploitation of Latent Learning for Classification of Auxiliary Tasks
Title | Semantic Segmentation from Remote Sensor Data and the Exploitation of Latent Learning for Classification of Auxiliary Tasks |
Authors | Bodhiswatta Chatterjee, Charalambos Poullis |
Abstract | In this paper we address three different aspects of semantic segmentation from remote sensor data using deep neural networks. Firstly, we focus on the semantic segmentation of buildings from remote sensor data and propose ICT-Net. The proposed network has been tested on the INRIA and AIRS benchmark datasets and is shown to outperform all other state of the art by more than 1.5% and 1.8% on the Jaccard index, respectively. Secondly, as the building classification is typically the first step of the reconstruction process, we investigate the relationship of the classification accuracy to the reconstruction accuracy. Finally, we present the simple yet compelling concept of latent learning and the implications it carries within the context of deep learning. We posit that a network trained on a primary task (i.e. building classification) is unintentionally learning about auxiliary tasks (e.g. the classification of road, tree, etc) which are complementary to the primary task. We extensively tested the proposed technique on the ISPRS benchmark dataset which contains multi-label ground truth, and report an average classification accuracy (F1 score) of 54.29% (SD=17.03) for roads, 10.15% (SD=2.54) for cars, 24.11% (SD=5.25) for trees, 42.74% (SD=6.62) for low vegetation, and 18.30% (SD=16.08) for clutter. The source code and supplemental material is publicly available at http://www.theICTlab.org/lp/2019ICT-Net/. |
Tasks | Semantic Segmentation |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09216v1 |
https://arxiv.org/pdf/1912.09216v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-segmentation-from-remote-sensor-data |
Repo | |
Framework | |
Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey
Title | Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey |
Authors | Basheer Qolomany, Ala Al-Fuqaha, Ajay Gupta, Driss Benhaddou, Safaa Alwajidi, Junaid Qadir, Alvis C. Fong |
Abstract | Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people’s lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents’ experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services. |
Tasks | Decision Making |
Published | 2019-04-01 |
URL | https://arxiv.org/abs/1904.01460v2 |
https://arxiv.org/pdf/1904.01460v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-big-data-and-smart-buildings |
Repo | |
Framework | |
Off-policy Multi-step Q-learning
Title | Off-policy Multi-step Q-learning |
Authors | Gabriel Kalweit, Maria Huegle, Joschka Boedecker |
Abstract | In the past few years, off-policy reinforcement learning methods have shown promising results in their application for robot control. Deep Q-learning, however, still suffers from poor data-efficiency which is limiting with regard to real-world applications. We follow the idea of multi-step TD-learning to enhance data-efficiency while remaining off-policy by proposing two novel Temporal-Difference formulations: (1) Truncated Q-functions which represent the return for the first n steps of a policy rollout and (2) Shifted Q-functions, acting as the farsighted return after this truncated rollout. We prove that the combination of these short- and long-term predictions is a representation of the full return, leading to the Composite Q-learning algorithm. We show the efficacy of Composite Q-learning in the tabular case and compare our approach in the function-approximation setting with TD3, Model-based Value Expansion and TD3(Delta), which we introduce as an off-policy variant of TD(Delta). We show on three simulated robot tasks that Composite TD3 outperforms TD3 as well as state-of-the-art off-policy multi-step approaches in terms of data-efficiency. |
Tasks | Q-Learning |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13518v1 |
https://arxiv.org/pdf/1909.13518v1.pdf | |
PWC | https://paperswithcode.com/paper/off-policy-multi-step-q-learning |
Repo | |
Framework | |
Cyclone intensity estimate with context-aware cyclegan
Title | Cyclone intensity estimate with context-aware cyclegan |
Authors | Yajing Xu, Haitao Yang, Mingfei Cheng, Si Li |
Abstract | Deep learning approaches to cyclone intensity estimationhave recently shown promising results. However, sufferingfrom the extreme scarcity of cyclone data on specific in-tensity, most existing deep learning methods fail to achievesatisfactory performance on cyclone intensity estimation,especially on classes with few instances. To avoid the degra-dation of recognition performance caused by scarce samples,we propose a context-aware CycleGAN which learns the la-tent evolution features from adjacent cyclone intensity andsynthesizes CNN features of classes lacking samples fromunpaired source classes. Specifically, our approach synthe-sizes features conditioned on the learned evolution features,while the extra information is not required. Experimentalresults of several evaluation methods show the effectivenessof our approach, even can predicting unseen classes. |
Tasks | |
Published | 2019-05-11 |
URL | https://arxiv.org/abs/1905.04425v1 |
https://arxiv.org/pdf/1905.04425v1.pdf | |
PWC | https://paperswithcode.com/paper/cyclone-intensity-estimate-with-context-aware |
Repo | |
Framework | |
A Joint Deep Learning Approach for Automated Liver and Tumor Segmentation
Title | A Joint Deep Learning Approach for Automated Liver and Tumor Segmentation |
Authors | Nadja Gruber, Stephan Antholzer, Werner Jaschke, Christian Kremser, Markus Haltmeier |
Abstract | Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer in adults, and the most common cause of death of people suffering from cirrhosis. The segmentation of liver lesions in CT images allows assessment of tumor load, treatment planning, prognosis and monitoring of treatment response. Manual segmentation is a very time-consuming task and in many cases, prone to inaccuracies and automatic tools for tumor detection and segmentation are desirable. In this paper, we compare two network architectures, one that is composed of one neural network and manages the segmentation task in one step and one that consists of two consecutive fully convolutional neural networks. The first network segments the liver whereas the second network segments the actual tumor inside the liver. Our networks are trained on a subset of the LiTS (Liver Tumor Segmentation) Challenge and evaluated on data. |
Tasks | |
Published | 2019-02-21 |
URL | https://arxiv.org/abs/1902.07971v2 |
https://arxiv.org/pdf/1902.07971v2.pdf | |
PWC | https://paperswithcode.com/paper/a-joint-deep-learning-approach-for-automated |
Repo | |
Framework | |
Single View Distortion Correction using Semantic Guidance
Title | Single View Distortion Correction using Semantic Guidance |
Authors | Szabolcs-Botond Lőrincz, Szabolcs Pável, Lehel Csató |
Abstract | Most distortion correction methods focus on simple forms of distortion, such as radial or linear distortions. These works undistort images either based on measurements in the presence of a calibration grid, or use multiple views to find point correspondences and predict distortion parameters. When possible distortions are more complex, e.g. in the case of a camera being placed behind a refractive surface such as glass, the standard method is to use a calibration grid. Considering a high variety of distortions, it is nonviable to conduct these measurements. In this work, we present a single view distortion correction method which is capable of undistorting images containing arbitrarily complex distortions by exploiting recent advancements in differentiable image sampling and in the usage of semantic information to augment various tasks. The results of this work show that our model is able to estimate and correct highly complex distortions, and that incorporating semantic information mitigates the process of image undistortion. |
Tasks | Calibration |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06505v1 |
https://arxiv.org/pdf/1911.06505v1.pdf | |
PWC | https://paperswithcode.com/paper/single-view-distortion-correction-using |
Repo | |
Framework | |
Continual and Multi-task Reinforcement Learning With Shared Episodic Memory
Title | Continual and Multi-task Reinforcement Learning With Shared Episodic Memory |
Authors | Artyom Y. Sorokin, Mikhail S. Burtsev |
Abstract | Episodic memory plays an important role in the behavior of animals and humans. It allows the accumulation of information about current state of the environment in a task-agnostic way. This episodic representation can be later accessed by down-stream tasks in order to make their execution more efficient. In this work, we introduce the neural architecture with shared episodic memory (SEM) for learning and the sequential execution of multiple tasks. We explicitly split the encoding of episodic memory and task-specific memory into separate recurrent sub-networks. An agent augmented with SEM was able to effectively reuse episodic knowledge collected during other tasks to improve its policy on a current task in the Taxi problem. Repeated use of episodic representation in continual learning experiments facilitated acquisition of novel skills in the same environment. |
Tasks | Continual Learning |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02662v1 |
https://arxiv.org/pdf/1905.02662v1.pdf | |
PWC | https://paperswithcode.com/paper/continual-and-multi-task-reinforcement |
Repo | |
Framework | |
Continuous Learning for Large-scale Personalized Domain Classification
Title | Continuous Learning for Large-scale Personalized Domain Classification |
Authors | Han Li, Jihwan Lee, Sidharth Mudgal, Ruhi Sarikaya, Young-Bum Kim |
Abstract | Domain classification is the task of mapping spoken language utterances to one of the natural language understanding domains in intelligent personal digital assistants (IPDAs). This is a major component in mainstream IPDAs in industry. Apart from official domains, thousands of third-party domains are also created by external developers to enhance the capability of IPDAs. As more domains are developed rapidly, the question of how to continuously accommodate the new domains still remains challenging. Moreover, existing continual learning approaches do not address the problem of incorporating personalized information dynamically for better domain classification. In this paper, we propose CoNDA, a neural network based approach for domain classification that supports incremental learning of new classes. Empirical evaluation shows that CoNDA achieves high accuracy and outperforms baselines by a large margin on both incrementally added new domains and existing domains. |
Tasks | Continual Learning |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00921v1 |
https://arxiv.org/pdf/1905.00921v1.pdf | |
PWC | https://paperswithcode.com/paper/continuous-learning-for-large-scale |
Repo | |
Framework | |
Variational Bayes: A report on approaches and applications
Title | Variational Bayes: A report on approaches and applications |
Authors | Manikanta Srikar Yellapragada, Chandra Prakash Konkimalla |
Abstract | Deep neural networks have achieved impressive results on a wide variety of tasks. However, quantifying uncertainty in the network’s output is a challenging task. Bayesian models offer a mathematical framework to reason about model uncertainty. Variational methods have been used for approximating intractable integrals that arise in Bayesian inference for neural networks. In this report, we review the major variational inference concepts pertinent to Bayesian neural networks and compare various approximation methods used in literature. We also talk about the applications of variational bayes in Reinforcement learning and continual learning. |
Tasks | Bayesian Inference, Continual Learning |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1905.10744v1 |
https://arxiv.org/pdf/1905.10744v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-bayes-a-report-on-approaches-and |
Repo | |
Framework | |
PCMC-Net: Feature-based Pairwise Choice Markov Chains
Title | PCMC-Net: Feature-based Pairwise Choice Markov Chains |
Authors | Alix Lhéritier |
Abstract | Pairwise Choice Markov Chains (PCMC) have been recently introduced to overcome limitations of choice models based on traditional axioms unable to express empirical observations from modern behavior economics like context effects occurring when a choice between two options is altered by adding a third alternative. The inference approach that estimates the transition rates between each possible pair of alternatives via maximum likelihood suffers when the examples of each alternative are scarce and is inappropriate when new alternatives can be observed at test time. In this work, we propose an amortized inference approach for PCMC by embedding its definition into a neural network that represents transition rates as a function of the alternatives’ and individual’s features. We apply our construction to the complex case of airline itinerary booking where singletons are common (due to varying prices and individual-specific itineraries), and context effects and behaviors strongly dependent on market segments are observed. Experiments show our network significantly outperforming, in terms of prediction accuracy and logarithmic loss, feature engineered standard and latent class Multinomial Logit models as well as recent machine learning approaches. |
Tasks | |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11553v2 |
https://arxiv.org/pdf/1909.11553v2.pdf | |
PWC | https://paperswithcode.com/paper/pcmc-net-feature-based-pairwise-choice-markov-1 |
Repo | |
Framework | |
Synthesising 3D Facial Motion from “In-the-Wild” Speech
Title | Synthesising 3D Facial Motion from “In-the-Wild” Speech |
Authors | Panagiotis Tzirakis, Athanasios Papaioannou, Alexander Lattas, Michail Tarasiou, Björn Schuller, Stefanos Zafeiriou |
Abstract | Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions (“in-the-wild”) and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading Words (LRW) a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-warping technique, named Deep Canonical Attentional Warping (DCAW), that can simultaneously learn hierarchical non-linear representations and a warping path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions. |
Tasks | |
Published | 2019-04-15 |
URL | http://arxiv.org/abs/1904.07002v1 |
http://arxiv.org/pdf/1904.07002v1.pdf | |
PWC | https://paperswithcode.com/paper/synthesising-3d-facial-motion-from-in-the |
Repo | |
Framework | |
DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer
Title | DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer |
Authors | Haoliang Sun, Ronak Mehta, Hao H. Zhou, Zhichun Huang, Sterling C. Johnson, Vivek Prabhakaran, Vikas Singh |
Abstract | Positron emission tomography (PET) imaging is an imaging modality for diagnosing a number of neurological diseases. In contrast to Magnetic Resonance Imaging (MRI), PET is costly and involves injecting a radioactive substance into the patient. Motivated by developments in modality transfer in vision, we study the generation of certain types of PET images from MRI data. We derive new flow-based generative models which we show perform well in this small sample size regime (much smaller than dataset sizes available in standard vision tasks). Our formulation, DUAL-GLOW, is based on two invertible networks and a relation network that maps the latent spaces to each other. We discuss how given the prior distribution, learning the conditional distribution of PET given the MRI image reduces to obtaining the conditional distribution between the two latent codes w.r.t. the two image types. We also extend our framework to leverage ‘side’ information (or attributes) when available. By controlling the PET generation through ‘conditioning’ on age, our model is also able to capture brain FDG-PET (hypometabolism) changes, as a function of age. We present experiments on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset with 826 subjects, and obtain good performance in PET image synthesis, qualitatively and quantitatively better than recent works. |
Tasks | Image Generation |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.08074v1 |
https://arxiv.org/pdf/1908.08074v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-glow-conditional-flow-based-generative |
Repo | |
Framework | |
PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes
Title | PedHunter: Occlusion Robust Pedestrian Detector in Crowded Scenes |
Authors | Cheng Chi, Shifeng Zhang, Junliang Xing, Zhen Lei, Stan Z. Li, Xudong Zou |
Abstract | Pedestrian detection in crowded scenes is a challenging problem, because occlusion happens frequently among different pedestrians. In this paper, we propose an effective and efficient detection network to hunt pedestrians in crowd scenes. The proposed method, namely PedHunter, introduces strong occlusion handling ability to existing region-based detection networks without bringing extra computations in the inference stage. Specifically, we design a mask-guided module to leverage the head information to enhance the feature representation learning of the backbone network. Moreover, we develop a strict classification criterion by improving the quality of positive samples during training to eliminate common false positives of pedestrian detection in crowded scenes. Besides, we present an occlusion-simulated data augmentation to enrich the pattern and quantity of occlusion samples to improve the occlusion robustness. As a consequent, we achieve state-of-the-art results on three pedestrian detection datasets including CityPersons, Caltech-USA and CrowdHuman. To facilitate further studies on the occluded pedestrian detection in surveillance scenes, we release a new pedestrian dataset, called SUR-PED, with a total of over 162k high-quality manually labeled instances in 10k images. The proposed dataset, source codes and trained models will be released. |
Tasks | Data Augmentation, Pedestrian Detection, Representation Learning |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06826v1 |
https://arxiv.org/pdf/1909.06826v1.pdf | |
PWC | https://paperswithcode.com/paper/pedhunter-occlusion-robust-pedestrian |
Repo | |
Framework | |