January 29, 2020

3273 words 16 mins read

Paper Group ANR 557

General non-linear Bellman equations. Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense. ParNet: Position-aware Aggregated Relation Network for Image-Text matching. Variations of Genetic Algorithms. Conscientious Classification: A Data Scientist’s G …

General non-linear Bellman equations


Title	General non-linear Bellman equations
Authors	Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto
Abstract	We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orderings. We present a different mathematical model that matches the same data, but that makes very different predictions under other circumstances. Second, the larger design space can perhaps lead to algorithms that perform better, similar to how discount factors are often used in practice even when the true objective is undiscounted. We show that many of the resulting Bellman operators still converge to a fixed point, and therefore that the resulting algorithms are reasonable and inherit many beneficial properties of their linear counterparts.
Tasks
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03687v1
PDF	https://arxiv.org/pdf/1907.03687v1.pdf
PWC	https://paperswithcode.com/paper/general-non-linear-bellman-equations
Repo
Framework

Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense


Title	Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense
Authors	Yixin Chen, Siyuan Huang, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu
Abstract	We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic scene parsing and reconstruction—3D estimations of object bounding boxes, camera pose, and room layout, and (ii) 3D human pose estimation. The intuition behind is to leverage the coupled nature of these two tasks to improve the granularity and performance of scene understanding. We propose to exploit two critical and essential connections between these two tasks: (i) human-object interaction (HOI) to model the fine-grained relations between agents and objects in the scene, and (ii) physical commonsense to model the physical plausibility of the reconstructed scene. The optimal configuration of the 3D scene, represented by a parse graph, is inferred using Markov chain Monte Carlo (MCMC), which efficiently traverses through the non-differentiable joint solution space. Experimental results demonstrate that the proposed algorithm significantly improves the performance of the two tasks on three datasets, showing an improved generalization ability.
Tasks	3D Human Pose Estimation, Human-Object Interaction Detection, Pose Estimation, Scene Parsing, Scene Understanding
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01507v1
PDF	https://arxiv.org/pdf/1909.01507v1.pdf
PWC	https://paperswithcode.com/paper/holistic-scene-understanding-single-view-3d
Repo
Framework

ParNet: Position-aware Aggregated Relation Network for Image-Text matching


Title	ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Authors	Yaxian Xia, Lun Huang, Wenmin Wang, Xiaoyong Wei, Wenmin Wang
Abstract	Exploring fine-grained relationship between entities(e.g. objects in image or words in sentence) has great contribution to understand multimedia content precisely. Previous attention mechanism employed in image-text matching either takes multiple self attention steps to gather correspondences or uses image objects (or words) as context to infer image-text similarity. However, they only take advantage of semantic information without considering that objects’ relative position also contributes to image understanding. To this end, we introduce a novel position-aware relation module to model both the semantic and spatial relationship simultaneously for image-text matching in this paper. Given an image, our method utilizes the location of different objects to capture spatial relationship innovatively. With the combination of semantic and spatial relationship, it’s easier to understand the content of different modalities (images and sentences) and capture fine-grained latent correspondences of image-text pairs. Besides, we employ a two-step aggregated relation module to capture interpretable alignment of image-text pairs. The first step, we call it intra-modal relation mechanism, in which we computes responses between different objects in an image or different words in a sentence separately; The second step, we call it inter-modal relation mechanism, in which the query plays a role of textual context to refine the relationship among object proposals in an image. In this way, our position-aware aggregated relation network (ParNet) not only knows which entities are relevant by attending on different objects (words) adaptively, but also adjust the inter-modal correspondence according to the latent alignments according to query’s content. Our approach achieves the state-of-the-art results on MS-COCO dataset.
Tasks	Text Matching
Published	2019-06-17
URL	https://arxiv.org/abs/1906.06892v1
PDF	https://arxiv.org/pdf/1906.06892v1.pdf
PWC	https://paperswithcode.com/paper/parnet-position-aware-aggregated-relation
Repo
Framework

Variations of Genetic Algorithms


Title	Variations of Genetic Algorithms
Authors	Alison Jenkins, Vinika Gupta, Alexis Myrick, Mary Lenoir
Abstract	The goal of this project is to develop the Genetic Algorithms (GA) for solving the Schaffer F6 function in fewer than 4000 function evaluations on a total of 30 runs. Four types of Genetic Algorithms (GA) are presented - Generational GA (GGA), Steady-State (mu+1)-GA (SSGA), Steady-Generational (mu,mu)-GA (SGGA), and (mu+mu)-GA.
Tasks
Published	2019-11-01
URL	https://arxiv.org/abs/1911.00490v1
PDF	https://arxiv.org/pdf/1911.00490v1.pdf
PWC	https://paperswithcode.com/paper/variations-of-genetic-algorithms
Repo
Framework

Conscientious Classification: A Data Scientist’s Guide to Discrimination-Aware Classification


Title	Conscientious Classification: A Data Scientist’s Guide to Discrimination-Aware Classification
Authors	Brian d’Alessandro, Cathy O’Neil, Tom LaGatta
Abstract	Recent research has helped to cultivate growing awareness that machine learning systems fueled by big data can create or exacerbate troubling disparities in society. Much of this research comes from outside of the practicing data science community, leaving its members with little concrete guidance to proactively address these concerns. This article introduces issues of discrimination to the data science community on its own terms. In it, we tour the familiar data mining process while providing a taxonomy of common practices that have the potential to produce unintended discrimination. We also survey how discrimination is commonly measured, and suggest how familiar development processes can be augmented to mitigate systems’ discriminatory potential. We advocate that data scientists should be intentional about modeling and reducing discriminatory outcomes. Without doing so, their efforts will result in perpetuating any systemic discrimination that may exist, but under a misleading veil of data-driven objectivity.
Tasks
Published	2019-07-21
URL	https://arxiv.org/abs/1907.09013v1
PDF	https://arxiv.org/pdf/1907.09013v1.pdf
PWC	https://paperswithcode.com/paper/conscientious-classification-a-data
Repo
Framework

ImgSensingNet: UAV Vision Guided Aerial-Ground Air Quality Sensing System


Title	ImgSensingNet: UAV Vision Guided Aerial-Ground Air Quality Sensing System
Authors	Yuzhe Yang, Zhiwen Hu, Kaigui Bian, Lingyang Song
Abstract	Given the increasingly serious air pollution problem, the monitoring of air quality index (AQI) in urban areas has drawn considerable attention. This paper presents ImgSensingNet, a vision guided aerial-ground sensing system, for fine-grained air quality monitoring and forecasting using the fusion of haze images taken by the unmanned-aerial-vehicle (UAV) and the AQI data collected by an on-ground three-dimensional (3D) wireless sensor network (WSN). Specifically, ImgSensingNet first leverages the computer vision technique to tell the AQI scale in different regions from the taken haze images, where haze-relevant features and a deep convolutional neural network (CNN) are designed for direct learning between haze images and corresponding AQI scale. Based on the learnt AQI scale, ImgSensingNet determines whether to wake up on-ground wireless sensors for small-scale AQI monitoring and inference, which can greatly reduce the energy consumption of the system. An entropy-based model is employed for accurate real-time AQI inference at unmeasured locations and future air quality distribution forecasting. We implement and evaluate ImgSensingNet on two university campuses since Feb. 2018, and has collected 17,630 photos and 2.6 millions of AQI data samples. Experimental results confirm that ImgSensingNet can achieve higher inference accuracy while greatly reduce the energy consumption, compared to state-of-the-art AQI monitoring approaches.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11299v1
PDF	https://arxiv.org/pdf/1905.11299v1.pdf
PWC	https://paperswithcode.com/paper/imgsensingnet-uav-vision-guided-aerial-ground
Repo
Framework

TagSLAM: Robust SLAM with Fiducial Markers


Title	TagSLAM: Robust SLAM with Fiducial Markers
Authors	Bernd Pfrommer, Kostas Daniilidis
Abstract	TagSLAM provides a convenient, flexible, and robust way of performing Simultaneous Localization and Mapping (SLAM) with AprilTag fiducial markers. By leveraging a few simple abstractions (bodies, tags, cameras), TagSLAM provides a front end to the GTSAM factor graph optimizer that makes it possible to rapidly design a range of experiments that are based on tags: full SLAM, extrinsic camera calibration with non-overlapping views, visual localization for ground truth, loop closure for odometry, pose estimation etc. We discuss in detail how TagSLAM initializes the factor graph in a robust way, and present loop closure as an application example. TagSLAM is a ROS based open source package and can be found at https://berndpfrommer.github.io/tagslam_web.
Tasks	Calibration, Pose Estimation, Simultaneous Localization and Mapping, Visual Localization
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00679v1
PDF	https://arxiv.org/pdf/1910.00679v1.pdf
PWC	https://paperswithcode.com/paper/tagslam-robust-slam-with-fiducial-markers
Repo
Framework

Filterbank design for end-to-end speech separation


Title	Filterbank design for end-to-end speech separation
Authors	Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Abstract	Single-channel speech separation has recently made great progress thanks to learned filterbanks as used in ConvTasNet. In parallel, parameterized filterbanks have been proposed for speaker recognition where only center frequencies and bandwidths are learned. In this work, we extend real-valued learned and parameterized filterbanks into complex-valued analytic filterbanks and define a set of corresponding representations and masking strategies. We evaluate these filterbanks on a newly released noisy speech separation dataset (WHAM). The results show that the proposed analytic learned filterbank consistently outperforms the real-valued filterbank of ConvTasNet. Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions. Finally, we show that the STFT achieves its best performance for 2ms windows.
Tasks	Speaker Recognition, Speech Separation
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10400v2
PDF	https://arxiv.org/pdf/1910.10400v2.pdf
PWC	https://paperswithcode.com/paper/filterbank-design-for-end-to-end-speech
Repo
Framework

Extracting human emotions at different places based on facial expressions and spatial clustering analysis


Title	Extracting human emotions at different places based on facial expressions and spatial clustering analysis
Authors	Yuhao Kang, Qingyuan Jia, Song Gao, Xiaohuan Zeng, Yueyao Wang, Stephan Angsuesser, Yu Liu, Xinyue Ye, Teng Fei
Abstract	The emergence of big data enables us to evaluate the various human emotions at places from a statistic perspective by applying affective computing. In this study, a novel framework for extracting human emotions from large-scale georeferenced photos at different places is proposed. After the construction of places based on spatial clustering of user generated footprints collected in social media websites, online cognitive services are utilized to extract human emotions from facial expressions using the state-of-the-art computer vision techniques. And two happiness metrics are defined for measuring the human emotions at different places. To validate the feasibility of the framework, we take 80 tourist attractions around the world as an example and a happiness ranking list of places is generated based on human emotions calculated over 2 million faces detected out from over 6 million photos. Different kinds of geographical contexts are taken into consideration to find out the relationship between human emotions and environmental factors. Results show that much of the emotional variation at different places can be explained by a few factors such as openness. The research may offer insights on integrating human emotions to enrich the understanding of sense of place in geography and in place-based GIS.
Tasks
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01817v1
PDF	https://arxiv.org/pdf/1905.01817v1.pdf
PWC	https://paperswithcode.com/paper/extracting-human-emotions-at-different-places
Repo
Framework

Worst-Case Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies


Title	Worst-Case Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies
Authors	Alexander Bauer, Shinichi Nakajima
Abstract	Considering the worst-case scenario, junction tree algorithm remains the most efficient and general solution for exact MAP inference on discrete graphical models. Unfortunately, its main tractability assumption requires the treewidth of a corresponding MRF to be bounded strongly limiting the range of admissible applications. In fact, many practical problems in the area of structured prediction require modelling of global dependencies by either directly introducing global factors or enforcing global constraints on the prediction variables. This, however, always results in a fully-connected graph making exact inference by means of this algorithm intractable. Nevertheless, depending on the structure of the global factors, we can further relax the conditions for an efficient inference. In this paper we reformulate the work in [1] and present a better way to establish the theory also extending the set of handleable problem instances for free - since it requires only a simple modification of the originally presented algorithm. To demonstrate that this extension is not of a purely theoretical interest we identify one further use case in the context of generalisation bounds for structured learning which cannot be handled by the previous formulation. Finally, we accordingly adjust the theoretical guarantees that the modified algorithm always finds an optimal solution in polynomial time.
Tasks	Structured Prediction
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12090v1
PDF	https://arxiv.org/pdf/1912.12090v1.pdf
PWC	https://paperswithcode.com/paper/worst-case-polynomial-time-exact-map
Repo
Framework

Adversarial Fault Tolerant Training for Deep Neural Networks


Title	Adversarial Fault Tolerant Training for Deep Neural Networks
Authors	Vasisht Duddu, D. Vijay Rao, Valentina E. Balas
Abstract	Deep Learning Accelerators are prone to faults which manifest in the form of errors in Neural Networks. Fault Tolerance in Neural Networks is crucial in real-time safety critical applications requiring computation for long durations. Neural Networks with high regularisation exhibit superior fault tolerance, however, at the cost of classification accuracy. In the view of difference in functionality, a Neural Network is modelled as two separate networks, i.e, the Feature Extractor with unsupervised learning objective and the Classifier with a supervised learning objective. Traditional approaches of training the entire network using a single supervised learning objective is insufficient to achieve the objectives of the individual components optimally. In this work, a novel multi-criteria objective function, combining unsupervised training of the Feature Extractor followed by supervised tuning with Classifier Network is proposed. The unsupervised training solves two games simultaneously in the presence of adversary neural networks with conflicting objectives to the Feature Extractor. The first game minimises the loss in reconstructing the input image for indistinguishability given the features from the Extractor, in the presence of a generative decoder. The second game solves a minimax constraint optimisation for distributional smoothening of feature space to match a prior distribution, in the presence of a Discriminator network. The resultant strongly regularised Feature Extractor is combined with the Classifier Network for supervised fine-tuning. The proposed Adversarial Fault Tolerant Neural Network Training is scalable to large networks and is independent of the architecture. The evaluation on benchmarking datasets: FashionMNIST and CIFAR10, indicates that the resultant networks have high accuracy with superior tolerance to stuck at “0” faults compared to widely used regularisers.
Tasks
Published	2019-07-06
URL	https://arxiv.org/abs/1907.03103v2
PDF	https://arxiv.org/pdf/1907.03103v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-fault-tolerant-training-for-deep
Repo
Framework

Comparison of Machine Learning Models in Food Authentication Studies


Title	Comparison of Machine Learning Models in Food Authentication Studies
Authors	Manokamna Singh, Katarina Domijan
Abstract	The underlying objective of food authentication studies is to determine whether unknown food samples have been correctly labelled. In this paper we study three near infrared (NIR) spectroscopic datasets from food samples of different types: meat samples (labelled by species), olive oil samples (labelled by their geographic origin) and honey samples (labelled as pure or adulterated by different adulterants). We apply and compare a large number of classification, dimension reduction and variable selection approaches to these datasets. NIR data pose specific challenges to classification and variable selection: the datasets are high - dimensional where the number of cases ($n$) $«$ number of features ($p$) and the recorded features are highly serially correlated. In this paper we carry out comparative analysis of different approaches and find that partial least squares, a classic tool employed for these types of data, outperforms all the other approaches considered.
Tasks	Dimensionality Reduction
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07302v1
PDF	https://arxiv.org/pdf/1905.07302v1.pdf
PWC	https://paperswithcode.com/paper/comparison-of-machine-learning-models-in-food
Repo
Framework

On the bias, risk and consistency of sample means in multi-armed bandits


Title	On the bias, risk and consistency of sample means in multi-armed bandits
Authors	Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo
Abstract	The sample mean is among the most well studied estimators in statistics, having many desirable properties such as unbiasedness and consistency. However, when analyzing data collected using a multi-armed bandit (MAB) experiment, the sample mean is biased and much remains to be understood about its properties. For example, when is it consistent, how large is its bias, and can we bound its mean squared error? This paper delivers a thorough and systematic treatment of the bias, risk and consistency of MAB sample means. Specifically, we identify four distinct sources of selection bias (sampling, stopping, choosing and rewinding) and analyze them both separately and together. We further demonstrate that a new notion of \emph{effective sample size} can be used to bound the risk of the sample mean under suitable loss functions. We present several carefully designed examples to provide intuition on the different sources of selection bias we study. Our treatment is nonparametric and algorithm-agnostic, meaning that it is not tied to a specific algorithm or goal. In a nutshell, our proofs combine variational representations of information theoretic divergences with new martingale concentration inequalities.
Tasks	Multi-Armed Bandits
Published	2019-02-02
URL	https://arxiv.org/abs/1902.00746v2
PDF	https://arxiv.org/pdf/1902.00746v2.pdf
PWC	https://paperswithcode.com/paper/on-the-bias-risk-and-consistency-of-sample
Repo
Framework

Image Segmentation using Multi-Threshold technique by Histogram Sampling


Title	Image Segmentation using Multi-Threshold technique by Histogram Sampling
Authors	Amit Gurung, Sangyal Lama Tamang
Abstract	The segmentation of digital images is one of the essential steps in image processing or a computer vision system. It helps in separating the pixels into different regions according to their intensity level. A large number of segmentation techniques have been proposed, and a few of them use complex computational operations. Among all, the most straightforward procedure that can be easily implemented is thresholding. In this paper, we present a unique heuristic approach for image segmentation that automatically determines multilevel thresholds by sampling the histogram of a digital image. Our approach emphasis on selecting a valley as optimal threshold values. We demonstrated that our approach outperforms the popular Otsu’s method in terms of CPU computational time. We demonstrated that our approach outperforms the popular Otsu’s method in terms of CPU computational time. We observed a maximum speed-up of 35.58x and a minimum speed-up of 10.21x on popular image processing benchmarks. To demonstrate the correctness of our approach in determining threshold values, we compute PSNR, SSIM, and FSIM values to compare with the values obtained by Otsu’s method. This evaluation shows that our approach is comparable and better in many cases as compared to well known Otsu’s method.
Tasks	Semantic Segmentation
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05084v1
PDF	https://arxiv.org/pdf/1909.05084v1.pdf
PWC	https://paperswithcode.com/paper/image-segmentation-using-multi-threshold
Repo
Framework

PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units


Title	PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units
Authors	Bo Li, Shanshan Tang, Haijun Yu
Abstract	Deep neural network with rectified linear units (ReLU) is getting more and more popular recently. However, the derivatives of the function represented by a ReLU network are not continuous, which limit the usage of ReLU network to situations only when smoothness is not required. In this paper, we construct deep neural networks with rectified power units (RePU), which can give better approximations for smooth functions. Optimal algorithms are proposed to explicitly build neural networks with sparsely connected RePUs, which we call PowerNets, to represent polynomials with no approximation error. For general smooth functions, we first project the function to their polynomial approximations, then use the proposed algorithms to construct corresponding PowerNets. Thus, the error of best polynomial approximation provides an upper bound of the best RePU network approximation error. For smooth functions in higher dimensional Sobolev spaces, we use fast spectral transforms for tensor-product grid and sparse grid discretization to get polynomial approximations. Our constructive algorithms show clearly a close connection between spectral methods and deep neural networks: a PowerNet with $n$ layers can exactly represent polynomials up to degree $s^n$, where $s$ is the power of RePUs. The proposed PowerNets have potential applications in the situations where high-accuracy is desired or smoothness is required.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.05136v1
PDF	https://arxiv.org/pdf/1909.05136v1.pdf
PWC	https://paperswithcode.com/paper/powernet-efficient-representations-of
Repo
Framework