January 28, 2020

2970 words 14 mins read

Paper Group ANR 1010

Paper Group ANR 1010

Adapting Computer Vision Algorithms for Omnidirectional Video. Compositionality for Recursive Neural Networks. Accurate and Robust Eye Contact Detection During Everyday Mobile Device Interactions. Learning Spatial Awareness to Improve Crowd Counting. Boosting insights in insurance tariff plans with tree-based machine learning methods. Estimation an …

Adapting Computer Vision Algorithms for Omnidirectional Video

Title Adapting Computer Vision Algorithms for Omnidirectional Video
Authors Hannes Fassold
Abstract Omnidirectional (360{\deg}) video has got quite popular because it provides a highly immersive viewing experience. For computer vision algorithms, it poses several challenges, like the special (equirectangular) projection commonly employed and the huge image size. In this work, we give a high-level overview of these challenges and outline strategies how to adapt computer vision algorithm for the specifics of omnidirectional video.
Tasks
Published 2019-07-22
URL https://arxiv.org/abs/1907.09233v1
PDF https://arxiv.org/pdf/1907.09233v1.pdf
PWC https://paperswithcode.com/paper/adapting-computer-vision-algorithms-for
Repo
Framework

Compositionality for Recursive Neural Networks

Title Compositionality for Recursive Neural Networks
Authors Martha Lewis
Abstract Modelling compositionality has been a longstanding area of research in the field of vector space semantics. The categorical approach to compositionality maps grammar onto vector spaces in a principled way, but comes under fire for requiring the formation of very high-dimensional matrices and tensors, and therefore being computationally infeasible. In this paper I show how a linear simplification of recursive neural tensor network models can be mapped directly onto the categorical approach, giving a way of computing the required matrices and tensors. This mapping suggests a number of lines of research for both categorical compositional vector space models of meaning and for recursive neural network models of compositionality.
Tasks
Published 2019-01-30
URL http://arxiv.org/abs/1901.10723v1
PDF http://arxiv.org/pdf/1901.10723v1.pdf
PWC https://paperswithcode.com/paper/compositionality-for-recursive-neural
Repo
Framework

Accurate and Robust Eye Contact Detection During Everyday Mobile Device Interactions

Title Accurate and Robust Eye Contact Detection During Everyday Mobile Device Interactions
Authors Mihai Bâce, Sander Staal, Andreas Bulling
Abstract Quantification of human attention is key to several tasks in mobile human-computer interaction (HCI), such as predicting user interruptibility, estimating noticeability of user interface content, or measuring user engagement. Previous works to study mobile attentive behaviour required special-purpose eye tracking equipment or constrained users’ mobility. We propose a novel method to sense and analyse visual attention on mobile devices during everyday interactions. We demonstrate the capabilities of our method on the sample task of eye contact detection that has recently attracted increasing research interest in mobile HCI. Our method builds on a state-of-the-art method for unsupervised eye contact detection and extends it to address challenges specific to mobile interactive scenarios. Through evaluation on two current datasets, we demonstrate significant performance improvements for eye contact detection across mobile devices, users, or environmental conditions. Moreover, we discuss how our method enables the calculation of additional attention metrics that, for the first time, enable researchers from different domains to study and quantify attention allocation during mobile interactions in the wild.
Tasks Eye Tracking
Published 2019-07-25
URL https://arxiv.org/abs/1907.11115v1
PDF https://arxiv.org/pdf/1907.11115v1.pdf
PWC https://paperswithcode.com/paper/accurate-and-robust-eye-contact-detection
Repo
Framework

Learning Spatial Awareness to Improve Crowd Counting

Title Learning Spatial Awareness to Improve Crowd Counting
Authors Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann
Abstract The aim of crowd counting is to estimate the number of people in images by leveraging the annotation of center positions for pedestrians’ heads. Promising progresses have been made with the prevalence of deep Convolutional Neural Networks. Existing methods widely employ the Euclidean distance (i.e., $L_2$ loss) to optimize the model, which, however, has two main drawbacks: (1) the loss has difficulty in learning the spatial awareness (i.e., the position of head) since it struggles to retain the high-frequency variation in the density map, and (2) the loss is highly sensitive to various noises in crowd counting, such as the zero-mean noise, head size changes, and occlusions. Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework. In this paper, we present a novel architecture called SPatial Awareness Network (SPANet) to incorporate spatial context for crowd counting. The Maximum Excess over Pixels (MEP) loss is proposed to achieve this by finding the pixel-level subregion with high discrepancy to the ground truth. To this end, we devise a weakly supervised learning scheme to generate such region with a multi-branch architecture. The proposed framework can be integrated into existing deep crowd counting methods and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that our method can significantly improve the performance of baselines. More remarkably, our approach outperforms the state-of-the-art methods on all benchmark datasets.
Tasks Crowd Counting
Published 2019-09-16
URL https://arxiv.org/abs/1909.07057v1
PDF https://arxiv.org/pdf/1909.07057v1.pdf
PWC https://paperswithcode.com/paper/learning-spatial-awareness-to-improve-crowd
Repo
Framework

Boosting insights in insurance tariff plans with tree-based machine learning methods

Title Boosting insights in insurance tariff plans with tree-based machine learning methods
Authors Roel Henckaerts, Marie-Pier Côté, Katrien Antonio, Roel Verbelen
Abstract Pricing actuaries typically operate within the framework of generalized linear models (GLMs). With the upswing of data analytics, our study puts focus on machine learning methods to develop full tariff plans built from both the frequency and severity of claims. We adapt the loss functions used in the algorithms such that the specific characteristics of insurance data are carefully incorporated: highly unbalanced count data with excess zeros and varying exposure on the frequency side combined with scarce, but potentially long-tailed data on the severity side. A key requirement is the need for transparent and interpretable pricing models which are easily explainable to all stakeholders. We therefore focus on machine learning with decision trees: starting from simple regression trees, we work towards more advanced ensembles such as random forests and boosted trees. We show how to choose the optimal tuning parameters for these models in an elaborate cross-validation scheme, we present visualization tools to obtain insights from the resulting models and the economic value of these new modeling approaches is evaluated. Boosted trees outperform the classical GLMs, allowing the insurer to form profitable portfolios and to guard against potential adverse risk selection.
Tasks
Published 2019-04-12
URL https://arxiv.org/abs/1904.10890v3
PDF https://arxiv.org/pdf/1904.10890v3.pdf
PWC https://paperswithcode.com/paper/190410890
Repo
Framework

Estimation and HAC-based Inference for Machine Learning Time Series Regressions

Title Estimation and HAC-based Inference for Machine Learning Time Series Regressions
Authors Andrii Babii, Eric Ghysels, Jonas Striaukas
Abstract Time series regression analysis in econometrics typically involves a framework relying on a set of mixing conditions to establish consistency and asymptotic normality of parameter estimates and HAC-type estimators of the residual long-run variances to conduct proper inference. This article introduces structured machine learning regressions for high-dimensional time series data using the aforementioned commonly used setting. To recognize the time series data structures we rely on the sparse-group LASSO estimator. We derive a new Fuk-Nagaev inequality for a class of $\tau$-dependent processes with heavier than Gaussian tails, nesting $\alpha$-mixing processes as a special case, and establish estimation, prediction, and inferential properties, including convergence rates of the HAC estimator for the long-run variance based on LASSO residuals. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data.
Tasks Time Series
Published 2019-12-13
URL https://arxiv.org/abs/1912.06307v1
PDF https://arxiv.org/pdf/1912.06307v1.pdf
PWC https://paperswithcode.com/paper/estimation-and-hac-based-inference-for
Repo
Framework

On Higher-order Moments in Adam

Title On Higher-order Moments in Adam
Authors Zhanhong Jiang, Aditya Balu, Sin Yong Tan, Young M Lee, Chinmay Hegde, Soumik Sarkar
Abstract In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension namely, HAdam, which uses higher order moments of the stochastic gradient. Our analysis and experiments reveal that certain higher-order moments of the stochastic gradient are able to achieve better performance compared to the vanilla Adam algorithm. We also provide some analysis of HAdam related to odd and even moments to explain some intriguing and seemingly non-intuitive empirical results.
Tasks
Published 2019-10-15
URL https://arxiv.org/abs/1910.06878v1
PDF https://arxiv.org/pdf/1910.06878v1.pdf
PWC https://paperswithcode.com/paper/on-higher-order-moments-in-adam
Repo
Framework

Quantifying Classification Uncertainty using Regularized Evidential Neural Networks

Title Quantifying Classification Uncertainty using Regularized Evidential Neural Networks
Authors Xujiang Zhao, Yuzhe Ou, Lance Kaplan, Feng Chen, Jin-Hee Cho
Abstract Traditional deep neural nets (NNs) have shown the state-of-the-art performance in the task of classification in various applications. However, NNs have not considered any types of uncertainty associated with the class probabilities to minimize risk due to misclassification under uncertainty in real life. Unlike Bayesian neural nets indirectly infering uncertainty through weight uncertainties, evidential neural networks (ENNs) have been recently proposed to support explicit modeling of the uncertainty of class probabilities. It treats predictions of an NN as subjective opinions and learns the function by collecting the evidence leading to these opinions by a deterministic NN from data. However, an ENN is trained as a black box without explicitly considering different types of inherent data uncertainty, such as vacuity (uncertainty due to a lack of evidence) or dissonance (uncertainty due to conflicting evidence). This paper presents a new approach, called a {\em regularized ENN}, that learns an ENN based on regularizations related to different characteristics of inherent data uncertainty. Via the experiments with both synthetic and real-world datasets, we demonstrate that the proposed regularized ENN can better learn of an ENN modeling different types of uncertainty in the class probabilities for classification tasks.
Tasks
Published 2019-10-15
URL https://arxiv.org/abs/1910.06864v1
PDF https://arxiv.org/pdf/1910.06864v1.pdf
PWC https://paperswithcode.com/paper/quantifying-classification-uncertainty-using
Repo
Framework

Classification in asymmetric spaces via sample compression

Title Classification in asymmetric spaces via sample compression
Authors Lee-Ad Gottlieb, Shira Ozeri
Abstract We initiate the rigorous study of classification in quasi-metric spaces. These are point sets endowed with a distance function that is non-negative and also satisfies the triangle inequality, but is asymmetric. We develop and refine a learning algorithm for quasi-metrics based on sample compression and nearest neighbor, and prove that it has favorable statistical properties.
Tasks
Published 2019-09-22
URL https://arxiv.org/abs/1909.09969v1
PDF https://arxiv.org/pdf/1909.09969v1.pdf
PWC https://paperswithcode.com/paper/190909969
Repo
Framework

Neural source-filter waveform models for statistical parametric speech synthesis

Title Neural source-filter waveform models for statistical parametric speech synthesis
Authors Xin Wang, Shinji Takaki, Junichi Yamagishi
Abstract Neural waveform models such as WaveNet have demonstrated better performance than conventional vocoders for statistical parametric speech synthesis. As an autoregressive (AR) model, WaveNet is limited by a slow sequential waveform generation process. Some new models that use the inverse-autoregressive flow (IAF) can generate a whole waveform in a one-shot manner. However, these IAF-based models require sequential transformation during training, which severely slows down the training speed. Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation. However, both models require additional training criteria, and their implementation is prohibitively complicated. We propose a framework for neural source-filter (NSF) waveform modeling without AR nor IAF-based approaches. This framework requires only three components for waveform generation: a source module that generates a sine-based signal as excitation, a non-AR dilated-convolution-based filter module that transforms the excitation into a waveform, and a conditional module that pre-processes the acoustic features for the source and filer modules. This framework minimizes spectral-amplitude distances for model training, which can be efficiently implemented by using short-time Fourier transform routines. Under this framework, we designed three NSF models and compared them with WaveNet. It was demonstrated that the NSF models generated waveforms at least 100 times faster than WaveNet, and the quality of the synthetic speech from the best NSF model was better than or equally good as that from WaveNet.
Tasks Speech Synthesis
Published 2019-04-27
URL https://arxiv.org/abs/1904.12088v2
PDF https://arxiv.org/pdf/1904.12088v2.pdf
PWC https://paperswithcode.com/paper/neural-source-filter-waveform-models-for
Repo
Framework

A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels

Title A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels
Authors Yucen Luo, Jun Zhu, Tomas Pfister
Abstract Recently deep neural networks have shown their capacity to memorize training data, even with noisy labels, which hurts generalization performance. To mitigate this issue, we provide a simple but effective baseline method that is robust to noisy labels, even with severe noise. Our objective involves a variance regularization term that implicitly penalizes the Jacobian norm of the neural network on the whole training set (including the noisy-labeled data), which encourages generalization and prevents overfitting to the corrupted labels. Experiments on both synthetically generated incorrect labels and realistic large-scale noisy datasets demonstrate that our approach achieves state-of-the-art performance with a high tolerance to severe noise.
Tasks
Published 2019-09-20
URL https://arxiv.org/abs/1909.09338v2
PDF https://arxiv.org/pdf/1909.09338v2.pdf
PWC https://paperswithcode.com/paper/a-simple-yet-effective-baseline-for-robust
Repo
Framework

Scale MLPerf-0.6 models on Google TPU-v3 Pods

Title Scale MLPerf-0.6 models on Google TPU-v3 Pods
Authors Sameer Kumar, Victor Bitorff, Dehao Chen, Chiachen Chou, Blake Hechtman, HyoukJoong Lee, Naveen Kumar, Peter Mattson, Shibo Wang, Tao Wang, Yuanzhong Xu, Zongwei Zhou
Abstract The recent submission of Google TPU-v3 Pods to the industry wide MLPerf v0.6 training benchmark demonstrates the scalability of a suite of industry relevant ML models. MLPerf defines a suite of models, datasets and rules to follow when benchmarking to ensure results are comparable across hardware, frameworks and companies. Using this suite of models, we discuss the optimizations and techniques including choice of optimizer, spatial partitioning and weight update sharding necessary to scale to 1024 TPU chips. Furthermore, we identify properties of models that make scaling them challenging, such as limited data parallelism and unscaled weights. These optimizations contribute to record performance in transformer, Resnet-50 and SSD in the Google MLPerf-0.6 submission.
Tasks
Published 2019-09-21
URL https://arxiv.org/abs/1909.09756v3
PDF https://arxiv.org/pdf/1909.09756v3.pdf
PWC https://paperswithcode.com/paper/190909756
Repo
Framework

An image-driven machine learning approach to kinetic modeling of a discontinuous precipitation reaction

Title An image-driven machine learning approach to kinetic modeling of a discontinuous precipitation reaction
Authors Elizabeth Kautz, Wufei Ma, Saumyadeep Jana, Arun Devaraj, Vineet Joshi, Bülent Yener, Daniel Lewis
Abstract Micrograph quantification is an essential component of several materials science studies. Machine learning methods, in particular convolutional neural networks, have previously demonstrated performance in image recognition tasks across several disciplines (e.g. materials science, medical imaging, facial recognition). Here, we apply these well-established methods to develop an approach to microstructure quantification for kinetic modeling of a discontinuous precipitation reaction in a case study on the uranium-molybdenum system. Prediction of material processing history based on image data (classification), calculation of area fraction of phases present in the micrographs (segmentation), and kinetic modeling from segmentation results were performed. Results indicate that convolutional neural networks represent microstructure image data well, and segmentation using the k-means clustering algorithm yields results that agree well with manually annotated images. Classification accuracies of original and segmented images are both 94% for a 5-class classification problem. Kinetic modeling results agree well with previously reported data using manual thresholding. The image quantification and kinetic modeling approach developed and presented here aims to reduce researcher bias introduced into the characterization process, and allows for leveraging information in limited image data sets.
Tasks
Published 2019-06-13
URL https://arxiv.org/abs/1906.05496v1
PDF https://arxiv.org/pdf/1906.05496v1.pdf
PWC https://paperswithcode.com/paper/an-image-driven-machine-learning-approach-to
Repo
Framework

Tracing the Propagation Path: A Flow Perspective of Representation Learning on Graphs

Title Tracing the Propagation Path: A Flow Perspective of Representation Learning on Graphs
Authors Menghan Wang, Kun Zhang, Gulin Li, Keping Yang, Luo Si
Abstract Graph Convolutional Networks (GCNs) have gained significant developments in representation learning on graphs. However, current GCNs suffer from two common challenges: 1) GCNs are only effective with shallow structures; stacking multiple GCN layers will lead to over-smoothing. 2) GCNs do not scale well with large, dense graphs due to the recursive neighborhood expansion. We generalize the propagation strategies of current GCNs as a \emph{“Sink$\to$Source”} mode, which seems to be an underlying cause of the two challenges. To address these issues intrinsically, in this paper, we study the information propagation mechanism in a \emph{“Source$\to$Sink”} mode. We introduce a new concept “information flow path” that explicitly defines where information originates and how it diffuses. Then a novel framework, namely Flow Graph Network (FlowGN), is proposed to learn node representations. FlowGN is computationally efficient and flexible in propagation strategies. Moreover, FlowGN decouples the layer structure from the information propagation process, removing the interior constraint of applying deep structures in traditional GCNs. Further experiments on public datasets demonstrate the superiority of FlowGN against state-of-the-art GCNs.
Tasks Representation Learning
Published 2019-12-12
URL https://arxiv.org/abs/1912.05977v1
PDF https://arxiv.org/pdf/1912.05977v1.pdf
PWC https://paperswithcode.com/paper/tracing-the-propagation-path-a-flow
Repo
Framework

LapTool-Net: A Contextual Detector of Surgical Tools in Laparoscopic Videos Based on Recurrent Convolutional Neural Networks

Title LapTool-Net: A Contextual Detector of Surgical Tools in Laparoscopic Videos Based on Recurrent Convolutional Neural Networks
Authors Babak Namazi, Ganesh Sankaranarayanan, Venkat Devarajan
Abstract We propose a new multilabel classifier, called LapTool-Net to detect the presence of surgical tools in each frame of a laparoscopic video. The novelty of LapTool-Net is the exploitation of the correlation among the usage of different tools and, the tools and tasks - namely, the context of the tools’ usage. Towards this goal, the pattern in the co-occurrence of the tools is utilized for designing a decision policy for a multilabel classifier based on a Recurrent Convolutional Neural Network (RCNN) architecture to simultaneously extract the spatio-temporal features. In contrast to the previous multilabel classification methods, the RCNN and the decision model are trained in an end-to-end manner using a multitask learning scheme. To overcome the high imbalance and avoid overfitting caused by the lack of variety in the training data, a high down-sampling rate is chosen based on the more frequent combinations. Furthermore, at the post-processing step, the prediction for all the frames of a video are corrected by designing a bi-directional RNN to model the long-term task’s order. LapTool-net was trained using a publicly available dataset of laparoscopic cholecystectomy. The results show LapTool-Net outperforms existing methods significantly, even while using fewer training samples and a shallower architecture.
Tasks
Published 2019-05-22
URL https://arxiv.org/abs/1905.08983v1
PDF https://arxiv.org/pdf/1905.08983v1.pdf
PWC https://paperswithcode.com/paper/laptool-net-a-contextual-detector-of-surgical
Repo
Framework
comments powered by Disqus