January 29, 2020

3273 words 16 mins read

Paper Group ANR 557

Paper Group ANR 557

General non-linear Bellman equations. Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense. ParNet: Position-aware Aggregated Relation Network for Image-Text matching. Variations of Genetic Algorithms. Conscientious Classification: A Data Scientist’s G …

General non-linear Bellman equations

Title General non-linear Bellman equations
Authors Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto
Abstract We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orderings. We present a different mathematical model that matches the same data, but that makes very different predictions under other circumstances. Second, the larger design space can perhaps lead to algorithms that perform better, similar to how discount factors are often used in practice even when the true objective is undiscounted. We show that many of the resulting Bellman operators still converge to a fixed point, and therefore that the resulting algorithms are reasonable and inherit many beneficial properties of their linear counterparts.
Tasks
Published 2019-07-08
URL https://arxiv.org/abs/1907.03687v1
PDF https://arxiv.org/pdf/1907.03687v1.pdf
PWC https://paperswithcode.com/paper/general-non-linear-bellman-equations
Repo
Framework

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

Title Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense
Authors Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
Abstract We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction—3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation. The intuition behind is to leverage the coupled nature of these two tasks to improve the granularity and performance of scene understanding. We propose to exploit two critical and essential connections between these two tasks: (i) human-object interaction (HOI) to model the fine-grained relations between agents and objects in the scene, and (ii) physical commonsense to model the physical plausibility of the reconstructed scene. The optimal configuration of the 3D scene, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable joint solution space. Experimental results demonstrate that the proposed algorithm significantly improves the performance of the two tasks on three datasets, showing an improved generalization ability.
Tasks 3D Human Pose Estimation, Human-Object Interaction Detection, Pose Estimation, Scene Parsing, Scene Understanding
Published 2019-09-04
URL https://arxiv.org/abs/1909.01507v1
PDF https://arxiv.org/pdf/1909.01507v1.pdf
PWC https://paperswithcode.com/paper/holistic-scene-understanding-single-view-3d
Repo
Framework

ParNet: Position-aware Aggregated Relation Network for Image-Text matching

Title ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Authors Yaxian Xia, Lun Huang, Wenmin Wang, Xiaoyong Wei, Wenmin Wang
Abstract Exploring fine-grained relationship between entities(e.g. objects in image or words in sentence) has great contribution to understand multimedia content precisely. Previous attention mechanism employed in image-text matching either takes multiple self attention steps to gather correspondences or uses image objects (or words) as context to infer image-text similarity. However, they only take advantage of semantic information without considering that objects’ relative position also contributes to image understanding. To this end, we introduce a novel position-aware relation module to model both the semantic and spatial relationship simultaneously for image-text matching in this paper. Given an image, our method utilizes the location of different objects to capture spatial relationship innovatively. With the combination of semantic and spatial relationship, it’s easier to understand the content of different modalities (images and sentences) and capture fine-grained latent correspondences of image-text pairs. Besides, we employ a two-step aggregated relation module to capture interpretable alignment of image-text pairs. The first step, we call it intra-modal relation mechanism, in which we computes responses between different objects in an image or different words in a sentence separately; The second step, we call it inter-modal relation mechanism, in which the query plays a role of textual context to refine the relationship among object proposals in an image. In this way, our position-aware aggregated relation network (ParNet) not only knows which entities are relevant by attending on different objects (words) adaptively, but also adjust the inter-modal correspondence according to the latent alignments according to query’s content. Our approach achieves the state-of-the-art results on MS-COCO dataset.
Tasks Text Matching
Published 2019-06-17
URL https://arxiv.org/abs/1906.06892v1
PDF https://arxiv.org/pdf/1906.06892v1.pdf
PWC https://paperswithcode.com/paper/parnet-position-aware-aggregated-relation
Repo
Framework

Variations of Genetic Algorithms

Title Variations of Genetic Algorithms
Authors Alison Jenkins, Vinika Gupta, Alexis Myrick, Mary Lenoir
Abstract The goal of this project is to develop the Genetic Algorithms (GA) for solving the Schaffer F6 function in fewer than 4000 function evaluations on a total of 30 runs. Four types of Genetic Algorithms (GA) are presented - Generational GA (GGA), Steady-State (mu+1)-GA (SSGA), Steady-Generational (mu,mu)-GA (SGGA), and (mu+mu)-GA.
Tasks
Published 2019-11-01
URL https://arxiv.org/abs/1911.00490v1
PDF https://arxiv.org/pdf/1911.00490v1.pdf
PWC https://paperswithcode.com/paper/variations-of-genetic-algorithms
Repo
Framework

Conscientious Classification: A Data Scientist’s Guide to Discrimination-Aware Classification

Title Conscientious Classification: A Data Scientist’s Guide to Discrimination-Aware Classification
Authors Brian d’Alessandro, Cathy O’Neil, Tom LaGatta
Abstract Recent research has helped to cultivate growing awareness that machine learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data mining process while providing a taxonomy of common practices that have the potential to produce unintended discrimination. We also survey how discrimination is commonly measured, and suggest how familiar development processes can be augmented to mitigate systems’ discriminatory potential. We advocate that data scientists should be intentional about modeling and reducing discriminatory outcomes. Without doing so, their efforts will result in perpetuating any systemic discrimination that may exist, but under a misleading veil of data-driven objectivity.
Tasks
Published 2019-07-21
URL https://arxiv.org/abs/1907.09013v1
PDF https://arxiv.org/pdf/1907.09013v1.pdf
PWC https://paperswithcode.com/paper/conscientious-classification-a-data
Repo
Framework

ImgSensingNet: UAV Vision Guided Aerial-Ground Air Quality Sensing System

Title ImgSensingNet: UAV Vision Guided Aerial-Ground Air Quality Sensing System
Authors Yuzhe Yang, Zhiwen Hu, Kaigui Bian, Lingyang Song
Abstract Given the increasingly serious air pollution problem, the monitoring of air quality index (AQI) in urban areas has drawn considerable attention. This paper presents ImgSensingNet, a vision guided aerial-ground sensing system, for fine-grained air quality monitoring and forecasting using the fusion of haze images taken by the unmanned-aerial-vehicle (UAV) and the AQI data collected by an on-ground three-dimensional (3D) wireless sensor network (WSN). Specifically, ImgSensingNet first leverages the computer vision technique to tell the AQI scale in different regions from the taken haze images, where haze-relevant features and a deep convolutional neural network (CNN) are designed for direct learning between haze images and corresponding AQI scale. Based on the learnt AQI scale, ImgSensingNet determines whether to wake up on-ground wireless sensors for small-scale AQI monitoring and inference, which can greatly reduce the energy consumption of the system. An entropy-based model is employed for accurate real-time AQI inference at unmeasured locations and future air quality distribution forecasting. We implement and evaluate ImgSensingNet on two university campuses since Feb. 2018, and has collected 17,630 photos and 2.6 millions of AQI data samples. Experimental results confirm that ImgSensingNet can achieve higher inference accuracy while greatly reduce the energy consumption, compared to state-of-the-art AQI monitoring approaches.
Tasks
Published 2019-05-27
URL https://arxiv.org/abs/1905.11299v1
PDF https://arxiv.org/pdf/1905.11299v1.pdf
PWC https://paperswithcode.com/paper/imgsensingnet-uav-vision-guided-aerial-ground
Repo
Framework

TagSLAM: Robust SLAM with Fiducial Markers

Title TagSLAM: Robust SLAM with Fiducial Markers
Authors Bernd Pfrommer, Kostas Daniilidis
Abstract TagSLAM provides a convenient, flexible, and robust way of performing Simultaneous Localization and Mapping (SLAM) with AprilTag fiducial markers. By leveraging a few simple abstractions (bodies, tags, cameras), TagSLAM provides a front end to the GTSAM factor graph optimizer that makes it possible to rapidly design a range of experiments that are based on tags: full SLAM, extrinsic camera calibration with non-overlapping views, visual localization for ground truth, loop closure for odometry, pose estimation etc. We discuss in detail how TagSLAM initializes the factor graph in a robust way, and present loop closure as an application example. TagSLAM is a ROS based open source package and can be found at https://berndpfrommer.github.io/tagslam_web.
Tasks Calibration, Pose Estimation, Simultaneous Localization and Mapping, Visual Localization
Published 2019-10-01
URL https://arxiv.org/abs/1910.00679v1
PDF https://arxiv.org/pdf/1910.00679v1.pdf
PWC https://paperswithcode.com/paper/tagslam-robust-slam-with-fiducial-markers
Repo
Framework

Filterbank design for end-to-end speech separation

Title Filterbank design for end-to-end speech separation
Authors Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Abstract Single-channel speech separation has recently made great progress thanks to learned filterbanks as used in ConvTasNet. In parallel, parameterized filterbanks have been proposed for speaker recognition where only center frequencies and bandwidths are learned. In this work, we extend real-valued learned and parameterized filterbanks into complex-valued analytic filterbanks and define a set of corresponding representations and masking strategies. We evaluate these filterbanks on a newly released noisy speech separation dataset (WHAM). The results show that the proposed analytic learned filterbank consistently outperforms the real-valued filterbank of ConvTasNet. Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions. Finally, we show that the STFT achieves its best performance for 2ms windows.
Tasks Speaker Recognition, Speech Separation
Published 2019-10-23
URL https://arxiv.org/abs/1910.10400v2
PDF https://arxiv.org/pdf/1910.10400v2.pdf
PWC https://paperswithcode.com/paper/filterbank-design-for-end-to-end-speech
Repo
Framework

Extracting human emotions at different places based on facial expressions and spatial clustering analysis

Title Extracting human emotions at different places based on facial expressions and spatial clustering analysis
Authors Yuhao Kang, Qingyuan Jia, Song Gao, Xiaohuan Zeng, Yueyao Wang, Stephan Angsuesser, Yu Liu, Xinyue Ye, Teng Fei
Abstract The emergence of big data enables us to evaluate the various human emotions at places from a statistic perspective by applying affective computing. In this study, a novel framework for extracting human emotions from large-scale georeferenced photos at different places is proposed. After the construction of places based on spatial clustering of user generated footprints collected in social media websites, online cognitive services are utilized to extract human emotions from facial expressions using the state-of-the-art computer vision techniques. And two happiness metrics are defined for measuring the human emotions at different places. To validate the feasibility of the framework, we take 80 tourist attractions around the world as an example and a happiness ranking list of places is generated based on human emotions calculated over 2 million faces detected out from over 6 million photos. Different kinds of geographical contexts are taken into consideration to find out the relationship between human emotions and environmental factors. Results show that much of the emotional variation at different places can be explained by a few factors such as openness. The research may offer insights on integrating human emotions to enrich the understanding of sense of place in geography and in place-based GIS.
Tasks
Published 2019-05-06
URL https://arxiv.org/abs/1905.01817v1
PDF https://arxiv.org/pdf/1905.01817v1.pdf
PWC https://paperswithcode.com/paper/extracting-human-emotions-at-different-places
Repo
Framework

Worst-Case Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies

Title Worst-Case Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies
Authors Alexander Bauer, Shinichi Nakajima
Abstract Considering the worst-case scenario, junction tree algorithm remains the most efficient and general solution for exact MAP inference on discrete graphical models. Unfortunately, its main tractability assumption requires the treewidth of a corresponding MRF to be bounded strongly limiting the range of admissible applications. In fact, many practical problems in the area of structured prediction require modelling of global dependencies by either directly introducing global factors or enforcing global constraints on the prediction variables. This, however, always results in a fully-connected graph making exact inference by means of this algorithm intractable. Nevertheless, depending on the structure of the global factors, we can further relax the conditions for an efficient inference. In this paper we reformulate the work in [1] and present a better way to establish the theory also extending the set of handleable problem instances for free - since it requires only a simple modification of the originally presented algorithm. To demonstrate that this extension is not of a purely theoretical interest we identify one further use case in the context of generalisation bounds for structured learning which cannot be handled by the previous formulation. Finally, we accordingly adjust the theoretical guarantees that the modified algorithm always finds an optimal solution in polynomial time.
Tasks Structured Prediction
Published 2019-12-27
URL https://arxiv.org/abs/1912.12090v1
PDF https://arxiv.org/pdf/1912.12090v1.pdf
PWC https://paperswithcode.com/paper/worst-case-polynomial-time-exact-map
Repo
Framework

Adversarial Fault Tolerant Training for Deep Neural Networks

Title Adversarial Fault Tolerant Training for Deep Neural Networks
Authors Vasisht Duddu, D. Vijay Rao, Valentina E. Balas
Abstract Deep Learning Accelerators are prone to faults which manifest in the form of errors in Neural Networks. Fault Tolerance in Neural Networks is crucial in real-time safety critical applications requiring computation for long durations. Neural Networks with high regularisation exhibit superior fault tolerance, however, at the cost of classification accuracy. In the view of difference in functionality, a Neural Network is modelled as two separate networks, i.e, the Feature Extractor with unsupervised learning objective and the Classifier with a supervised learning objective. Traditional approaches of training the entire network using a single supervised learning objective is insufficient to achieve the objectives of the individual components optimally. In this work, a novel multi-criteria objective function, combining unsupervised training of the Feature Extractor followed by supervised tuning with Classifier Network is proposed. The unsupervised training solves two games simultaneously in the presence of adversary neural networks with conflicting objectives to the Feature Extractor. The first game minimises the loss in reconstructing the input image for indistinguishability given the features from the Extractor, in the presence of a generative decoder. The second game solves a minimax constraint optimisation for distributional smoothening of feature space to match a prior distribution, in the presence of a Discriminator network. The resultant strongly regularised Feature Extractor is combined with the Classifier Network for supervised fine-tuning. The proposed Adversarial Fault Tolerant Neural Network Training is scalable to large networks and is independent of the architecture. The evaluation on benchmarking datasets: FashionMNIST and CIFAR10, indicates that the resultant networks have high accuracy with superior tolerance to stuck at “0” faults compared to widely used regularisers.
Tasks
Published 2019-07-06
URL https://arxiv.org/abs/1907.03103v2
PDF https://arxiv.org/pdf/1907.03103v2.pdf
PWC https://paperswithcode.com/paper/adversarial-fault-tolerant-training-for-deep
Repo
Framework

Comparison of Machine Learning Models in Food Authentication Studies

Title Comparison of Machine Learning Models in Food Authentication Studies
Authors Manokamna Singh, Katarina Domijan
Abstract The underlying objective of food authentication studies is to determine whether unknown food samples have been correctly labelled. In this paper we study three near infrared (NIR) spectroscopic datasets from food samples of different types: meat samples (labelled by species), olive oil samples (labelled by their geographic origin) and honey samples (labelled as pure or adulterated by different adulterants). We apply and compare a large number of classification, dimension reduction and variable selection approaches to these datasets. NIR data pose specific challenges to classification and variable selection: the datasets are high - dimensional where the number of cases ($n$) $«$ number of features ($p$) and the recorded features are highly serially correlated. In this paper we carry out comparative analysis of different approaches and find that partial least squares, a classic tool employed for these types of data, outperforms all the other approaches considered.
Tasks Dimensionality Reduction
Published 2019-05-17
URL https://arxiv.org/abs/1905.07302v1
PDF https://arxiv.org/pdf/1905.07302v1.pdf
PWC https://paperswithcode.com/paper/comparison-of-machine-learning-models-in-food
Repo
Framework

On the bias, risk and consistency of sample means in multi-armed bandits

Title On the bias, risk and consistency of sample means in multi-armed bandits
Authors Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo
Abstract The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information theoretic divergences with new martingale concentration inequalities.
Tasks Multi-Armed Bandits
Published 2019-02-02
URL https://arxiv.org/abs/1902.00746v2
PDF https://arxiv.org/pdf/1902.00746v2.pdf
PWC https://paperswithcode.com/paper/on-the-bias-risk-and-consistency-of-sample
Repo
Framework

Image Segmentation using Multi-Threshold technique by Histogram Sampling

Title Image Segmentation using Multi-Threshold technique by Histogram Sampling
Authors Amit Gurung, Sangyal Lama Tamang
Abstract The segmentation of digital images is one of the essential steps in image processing or a computer vision system. It helps in separating the pixels into different regions according to their intensity level. A large number of segmentation techniques have been proposed, and a few of them use complex computational operations. Among all, the most straightforward procedure that can be easily implemented is thresholding. In this paper, we present a unique heuristic approach for image segmentation that automatically determines multilevel thresholds by sampling the histogram of a digital image. Our approach emphasis on selecting a valley as optimal threshold values. We demonstrated that our approach outperforms the popular Otsu’s method in terms of CPU computational time. We demonstrated that our approach outperforms the popular Otsu’s method in terms of CPU computational time. We observed a maximum speed-up of 35.58x and a minimum speed-up of 10.21x on popular image processing benchmarks. To demonstrate the correctness of our approach in determining threshold values, we compute PSNR, SSIM, and FSIM values to compare with the values obtained by Otsu’s method. This evaluation shows that our approach is comparable and better in many cases as compared to well known Otsu’s method.
Tasks Semantic Segmentation
Published 2019-09-11
URL https://arxiv.org/abs/1909.05084v1
PDF https://arxiv.org/pdf/1909.05084v1.pdf
PWC https://paperswithcode.com/paper/image-segmentation-using-multi-threshold
Repo
Framework

PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units

Title PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units
Authors Bo Li, Shanshan Tang, Haijun Yu
Abstract Deep neural network with rectified linear units (ReLU) is getting more and more popular recently. However, the derivatives of the function represented by a ReLU network are not continuous, which limit the usage of ReLU network to situations only when smoothness is not required. In this paper, we construct deep neural networks with rectified power units (RePU), which can give better approximations for smooth functions. Optimal algorithms are proposed to explicitly build neural networks with sparsely connected RePUs, which we call PowerNets, to represent polynomials with no approximation error. For general smooth functions, we first project the function to their polynomial approximations, then use the proposed algorithms to construct corresponding PowerNets. Thus, the error of best polynomial approximation provides an upper bound of the best RePU network approximation error. For smooth functions in higher dimensional Sobolev spaces, we use fast spectral transforms for tensor-product grid and sparse grid discretization to get polynomial approximations. Our constructive algorithms show clearly a close connection between spectral methods and deep neural networks: a PowerNet with $n$ layers can exactly represent polynomials up to degree $s^n$, where $s$ is the power of RePUs. The proposed PowerNets have potential applications in the situations where high-accuracy is desired or smoothness is required.
Tasks
Published 2019-09-09
URL https://arxiv.org/abs/1909.05136v1
PDF https://arxiv.org/pdf/1909.05136v1.pdf
PWC https://paperswithcode.com/paper/powernet-efficient-representations-of
Repo
Framework
comments powered by Disqus