April 2, 2020

3219 words 16 mins read

Paper Group ANR 338

Paper Group ANR 338

A Functional EM Algorithm for Panel Count Data with Missing Counts. The Spectral Underpinning of word2vec. Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines. Why Do Line Drawings Work? A Realism Hypothesis. DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation. …

A Functional EM Algorithm for Panel Count Data with Missing Counts

Title A Functional EM Algorithm for Panel Count Data with Missing Counts
Authors Alexander Moreno, Zhenke Wu, Jamie Yap, David Wetter, Cho Lam, Inbal Nahum-Shani, Walter Dempsey, James M. Rehg
Abstract Panel count data is recurrent events data where counts of events are observed at discrete time points. Panel counts naturally describe self-reported behavioral data, and the occurrence of missing or unreliable reports is common. Unfortunately, no prior work has tackled the problem of missingness in this setting. We address this gap in the literature by developing a novel functional EM algorithm that can be used as a wrapper around several popular panel count mean function inference methods when some counts are missing. We provide a novel theoretical analysis of our method showing strong consistency. Extending the methods in (Balakrishnan et al., 2017, Wu et al. 2016), we show that the functional EM algorithm recovers the true mean function of the counting process. We accomplish this by developing alternative regularity conditions for our objective function in order to show convergence of the population EM algorithm. We prove strong consistency of the M-step, thus giving strong consistency guarantees for the finite sample EM algorithm. We present experimental results for synthetic data, synthetic missingness on real data, and a smoking cessation study, where we find that participants may underestimate cigarettes smoked by approximately 18.6% over a 12 day period.
Tasks
Published 2020-03-02
URL https://arxiv.org/abs/2003.01169v2
PDF https://arxiv.org/pdf/2003.01169v2.pdf
PWC https://paperswithcode.com/paper/a-functional-em-algorithm-for-panel-count
Repo
Framework

The Spectral Underpinning of word2vec

Title The Spectral Underpinning of word2vec
Authors Ariel Jaffe, Yuval Kluger, Ofir Lindenbaum, Jonathan Patsenker, Erez Peterfreund, Stefan Steinerberger
Abstract word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
Tasks
Published 2020-02-27
URL https://arxiv.org/abs/2002.12317v1
PDF https://arxiv.org/pdf/2002.12317v1.pdf
PWC https://paperswithcode.com/paper/the-spectral-underpinning-of-word2vec
Repo
Framework

Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines

Title Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines
Authors David Torpey, Turgay Celik
Abstract This paper proposes a simple yet effective method for human action recognition in video. The proposed method separately extracts local appearance and motion features using state-of-the-art three-dimensional convolutional neural networks from sampled snippets of a video. These local features are then concatenated to form global representations which are then used to train a linear SVM to perform the action classification using full context of the video, as partial context as used in previous works. The videos undergo two simple proposed preprocessing techniques, optical flow scaling and crop filling. We perform an extensive evaluation on three common benchmark dataset to empirically show the benefit of the SVM, and the two preprocessing steps.
Tasks Action Classification, Optical Flow Estimation, Temporal Action Localization
Published 2020-02-19
URL https://arxiv.org/abs/2002.09423v1
PDF https://arxiv.org/pdf/2002.09423v1.pdf
PWC https://paperswithcode.com/paper/human-action-recognition-using-local-two
Repo
Framework

Why Do Line Drawings Work? A Realism Hypothesis

Title Why Do Line Drawings Work? A Realism Hypothesis
Authors Aaron Hertzmann
Abstract Why is it that we can recognize object identity and 3D shape from line drawings, even though they do not exist in the natural world? This paper hypothesizes that the human visual system perceives line drawings as if they were approximately realistic images. Moreover, the techniques of line drawing are chosen to accurately convey shape to a human observer. Several implications and variants of this hypothesis are explored.
Tasks
Published 2020-02-14
URL https://arxiv.org/abs/2002.06260v1
PDF https://arxiv.org/pdf/2002.06260v1.pdf
PWC https://paperswithcode.com/paper/why-do-line-drawings-work-a-realism
Repo
Framework

DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation

Title DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation
Authors Chenjie Wang, Bin Luo, Yun Zhang, Qing Zhao, Lu Yin, Wei Wang, Xin Su, Yajun Wang, Chengyuan Li
Abstract Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.
Tasks Motion Segmentation
Published 2020-03-10
URL https://arxiv.org/abs/2003.04569v1
PDF https://arxiv.org/pdf/2003.04569v1.pdf
PWC https://paperswithcode.com/paper/dymslam4d-dynamic-scene-reconstruction-based
Repo
Framework

Do CNNs Encode Data Augmentations?

Title Do CNNs Encode Data Augmentations?
Authors Eddie Yan, Yanping Huang
Abstract Data augmentations are an important ingredient in the recipe for training robust neural networks, especially in computer vision. A fundamental question is whether neural network features explicitly encode data augmentation transformations. To answer this question, we introduce a systematic approach to investigate which layers of neural networks are the most predictive of augmentation transformations. Our approach uses layer features in pre-trained vision models with minimal additional processing to predict common properties transformed by augmentation (scale, aspect ratio, hue, saturation, contrast, brightness). Surprisingly, neural network features not only predict data augmentation transformations, but they predict many transformations with high accuracy. After validating that neural networks encode features corresponding to augmentation transformations, we show that these features are primarily encoded in the early layers of modern CNNs.
Tasks Data Augmentation
Published 2020-02-29
URL https://arxiv.org/abs/2003.08773v1
PDF https://arxiv.org/pdf/2003.08773v1.pdf
PWC https://paperswithcode.com/paper/do-cnns-encode-data-augmentations
Repo
Framework

Harnessing Explanations to Bridge AI and Humans

Title Harnessing Explanations to Bridge AI and Humans
Authors Vivian Lai, Samuel Carton, Chenhao Tan
Abstract Machine learning models are increasingly integrated into societally critical applications such as recidivism prediction and medical diagnosis, thanks to their superior predictive power. In these applications, however, full automation is often not desired due to ethical and legal concerns. The research community has thus ventured into developing interpretable methods that explain machine predictions. While these explanations are meant to assist humans in understanding machine predictions and thereby allowing humans to make better decisions, this hypothesis is not supported in many recent studies. To improve human decision-making with AI assistance, we propose future directions for closing the gap between the efficacy of explanations and improvement in human performance.
Tasks Decision Making, Medical Diagnosis
Published 2020-03-16
URL https://arxiv.org/abs/2003.07370v1
PDF https://arxiv.org/pdf/2003.07370v1.pdf
PWC https://paperswithcode.com/paper/harnessing-explanations-to-bridge-ai-and
Repo
Framework

Application of Deep Neural Networks to assess corporate Credit Rating

Title Application of Deep Neural Networks to assess corporate Credit Rating
Authors Parisa Golbayani, Dan Wang, Ionut Florescu
Abstract Recent literature implements machine learning techniques to assess corporate credit rating based on financial statement reports. In this work, we analyze the performance of four neural network architectures (MLP, CNN, CNN2D, LSTM) in predicting corporate credit rating as issued by Standard and Poor’s. We analyze companies from the energy, financial and healthcare sectors in US. The goal of the analysis is to improve application of machine learning algorithms to credit assessment. To this end, we focus on three questions. First, we investigate if the algorithms perform better when using a selected subset of features, or if it is better to allow the algorithms to select features themselves. Second, is the temporal aspect inherent in financial data important for the results obtained by a machine learning algorithm? Third, is there a particular neural network architecture that consistently outperforms others with respect to input features, sectors and holdout set? We create several case studies to answer these questions and analyze the results using ANOVA and multiple comparison testing procedure.
Tasks
Published 2020-03-04
URL https://arxiv.org/abs/2003.02334v1
PDF https://arxiv.org/pdf/2003.02334v1.pdf
PWC https://paperswithcode.com/paper/application-of-deep-neural-networks-to-assess
Repo
Framework

Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets

Title Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets
Authors Ziyue Xiang, Daniel E. Acuna
Abstract Scientific image tampering is a problem that affects not only authors but also the general perception of the research community. Although previous researchers have developed methods to identify tampering in natural images, these methods may not thrive under the scientific setting as scientific images have different statistics, format, quality, and intentions. Therefore, we propose a scientific-image specific tampering detection method based on noise inconsistencies, which is capable of learning and generalizing to different fields of science. We train and test our method on a new dataset of manipulated western blot and microscopy imagery, which aims at emulating problematic images in science. The test results show that our method can detect various types of image manipulation in different scenarios robustly, and it outperforms existing general-purpose image tampering detection schemes. We discuss applications beyond these two types of images and suggest next steps for making detection of problematic images a systematic step in peer review and science in general.
Tasks
Published 2020-01-21
URL https://arxiv.org/abs/2001.07799v2
PDF https://arxiv.org/pdf/2001.07799v2.pdf
PWC https://paperswithcode.com/paper/scientific-image-tampering-detection-based-on
Repo
Framework

RODNet: Object Detection under Severe Conditions Using Vision-Radio Cross-Modal Supervision

Title RODNet: Object Detection under Severe Conditions Using Vision-Radio Cross-Modal Supervision
Authors Yizhou Wang, Zhongyu Jiang, Xiangyu Gao, Jenq-Neng Hwang, Guanbin Xing, Hui Liu
Abstract Radar is usually more robust than the camera in severe autonomous driving scenarios, e.g., weak/strong lighting and bad weather. However, the semantic information from the radio signals is difficult to extract. In this paper, we propose a radio object detection network (RODNet) to detect objects purely from the processed radar data in the format of range-azimuth frequency heatmaps (RAMaps). To train the RODNet, we introduce a cross-modal supervision framework, which utilizes the rich information extracted by a vision-based object 3D localization technique to teach object detection for the radar. In order to train and evaluate our method, we build a new dataset – CRUW, containing synchronized video sequences and RAMaps in various scenarios. After intensive experiments, our RODNet shows favorable object detection performance without the presence of the camera. To the best of our knowledge, this is the first work that can achieve accurate multi-class object detection purely using radar data as the input.
Tasks Autonomous Driving, Object Detection
Published 2020-03-03
URL https://arxiv.org/abs/2003.01816v1
PDF https://arxiv.org/pdf/2003.01816v1.pdf
PWC https://paperswithcode.com/paper/rodnet-object-detection-under-severe
Repo
Framework

Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection

Title Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection
Authors Zongxian Li, Qixiang Ye, Chong Zhang, Jingjing Liu, Shijian Lu, Yonghong Tian
Abstract Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models. However, existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift. In this work, we propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains while considering the instantaneous alignment difficulty. The core of SGA is to calculate “hardness” factors for sample pairs indicating domain distance in a kernel space. With the hardness factor, the proposed SGA adaptively indicates the importance of samples and assigns them different constrains. Indicated by hardness factors, Self-Guided Progressive Sampling (SPS) is implemented in an “easy-to-hard” way during model adaptation. Using multi-stage convolutional features, SGA is further aggregated to fully align hierarchical representations of detection models. Extensive experiments on commonly used benchmarks show that SGA improves the state-of-the-art methods with significant margins, while demonstrating the effectiveness on large domain shift.
Tasks Domain Adaptation, Object Detection, Unsupervised Domain Adaptation
Published 2020-03-19
URL https://arxiv.org/abs/2003.08777v2
PDF https://arxiv.org/pdf/2003.08777v2.pdf
PWC https://paperswithcode.com/paper/self-guided-adaptation-progressive
Repo
Framework

3D medical image segmentation with labeled and unlabeled data using autoencoders at the example of liver segmentation in CT images

Title 3D medical image segmentation with labeled and unlabeled data using autoencoders at the example of liver segmentation in CT images
Authors Cheryl Sital, Tom Brosch, Dominique Tio, Alexander Raaijmakers, Jürgen Weese
Abstract Automatic segmentation of anatomical structures with convolutional neural networks (CNNs) constitutes a large portion of research in medical image analysis. The majority of CNN-based methods rely on an abundance of labeled data for proper training. Labeled medical data is often scarce, but unlabeled data is more widely available. This necessitates approaches that go beyond traditional supervised learning and leverage unlabeled data for segmentation tasks. This work investigates the potential of autoencoder-extracted features to improve segmentation with a CNN. Two strategies were considered. First, transfer learning where pretrained autoencoder features were used as initialization for the convolutional layers in the segmentation network. Second, multi-task learning where the tasks of segmentation and feature extraction, by means of input reconstruction, were learned and optimized simultaneously. A convolutional autoencoder was used to extract features from unlabeled data and a multi-scale, fully convolutional CNN was used to perform the target task of 3D liver segmentation in CT images. For both strategies, experiments were conducted with varying amounts of labeled and unlabeled training data. The proposed learning strategies improved results in $75%$ of the experiments compared to training from scratch and increased the dice score by up to $0.040$ and $0.024$ for a ratio of unlabeled to labeled training data of about $32 : 1$ and $12.5 : 1$, respectively. The results indicate that both training strategies are more effective with a large ratio of unlabeled to labeled training data.
Tasks Liver Segmentation, Medical Image Segmentation, Multi-Task Learning, Semantic Segmentation, Transfer Learning
Published 2020-03-17
URL https://arxiv.org/abs/2003.07923v1
PDF https://arxiv.org/pdf/2003.07923v1.pdf
PWC https://paperswithcode.com/paper/3d-medical-image-segmentation-with-labeled
Repo
Framework

Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling

Title Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling
Authors Mu Yuan, Lan Zhang, Xiang-Yang Li, Hui Xiong
Abstract Labeling data (e.g., labeling the people, objects, actions and scene in images) comprehensively and efficiently is a widely needed but challenging task. Numerous models were proposed to label various data and many approaches were designed to enhance the ability of deep learning models or accelerate them. Unfortunately, a single machine-learning model is not powerful enough to extract various semantic information from data. Given certain applications, such as image retrieval platforms and photo album management apps, it is often required to execute a collection of models to obtain sufficient labels. With limited computing resources and stringent delay, given a data stream and a collection of applicable resource-hungry deep-learning models, we design a novel approach to adaptively schedule a subset of these models to execute on each data item, aiming to maximize the value of the model output (e.g., the number of high-confidence labels). Achieving this lofty goal is nontrivial since a model’s output on any data item is content-dependent and unknown until we execute it. To tackle this, we propose an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of unexecuted models by mining semantic relationship among diverse models, and 2) two heuristic algorithms to adaptively schedule the model execution order under a deadline or deadline-memory constraints respectively. The proposed framework doesn’t require any prior knowledge of the data, which works as a powerful complement to existing model optimization technologies. We conduct extensive evaluations on five diverse image datasets and 30 popular image labeling models to demonstrate the effectiveness of our design: our design could save around 53% execution time without loss of any valuable labels.
Tasks Image Retrieval
Published 2020-02-08
URL https://arxiv.org/abs/2002.05520v1
PDF https://arxiv.org/pdf/2002.05520v1.pdf
PWC https://paperswithcode.com/paper/comprehensive-and-efficient-data-labeling-via
Repo
Framework

Liver Segmentation in Abdominal CT Images via Auto-Context Neural Network and Self-Supervised Contour Attention

Title Liver Segmentation in Abdominal CT Images via Auto-Context Neural Network and Self-Supervised Contour Attention
Authors Minyoung Chung, Jingyu Lee, Jeongjin Lee, Yeong-Gil Shin
Abstract Accurate image segmentation of the liver is a challenging problem owing to its large shape variability and unclear boundaries. Although the applications of fully convolutional neural networks (CNNs) have shown groundbreaking results, limited studies have focused on the performance of generalization. In this study, we introduce a CNN for liver segmentation on abdominal computed tomography (CT) images that shows high generalization performance and accuracy. To improve the generalization performance, we initially propose an auto-context algorithm in a single CNN. The proposed auto-context neural network exploits an effective high-level residual estimation to obtain the shape prior. Identical dual paths are effectively trained to represent mutual complementary features for an accurate posterior analysis of a liver. Further, we extend our network by employing a self-supervised contour scheme. We trained sparse contour features by penalizing the ground-truth contour to focus more contour attentions on the failures. The experimental results show that the proposed network results in better accuracy when compared to the state-of-the-art networks by reducing 10.31% of the Hausdorff distance. We used 180 abdominal CT images for training and validation. Two-fold cross-validation is presented for a comparison with the state-of-the-art neural networks. Novel multiple N-fold cross-validations are conducted to verify the performance of generalization. The proposed network showed the best generalization performance among the networks. Additionally, we present a series of ablation experiments that comprehensively support the importance of the underlying concepts.
Tasks Computed Tomography (CT), Liver Segmentation, Semantic Segmentation
Published 2020-02-14
URL https://arxiv.org/abs/2002.05895v1
PDF https://arxiv.org/pdf/2002.05895v1.pdf
PWC https://paperswithcode.com/paper/liver-segmentation-in-abdominal-ct-images-via
Repo
Framework

Try Depth Instead of Weight Correlations: Mean-field is a Less Restrictive Assumption for Deeper Networks

Title Try Depth Instead of Weight Correlations: Mean-field is a Less Restrictive Assumption for Deeper Networks
Authors Sebastian Farquhar, Lewis Smith, Yarin Gal
Abstract We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive. We argue mathematically that full-covariance approximations only improve the ELBO if they improve the expected log-likelihood. We further show that deeper mean-field networks are able to express predictive distributions approximately equivalent to shallower full-covariance networks. We validate these observations empirically, demonstrating that deeper models decrease the divergence between diagonal- and full-covariance Gaussian fits to the true posterior.
Tasks
Published 2020-02-10
URL https://arxiv.org/abs/2002.03704v1
PDF https://arxiv.org/pdf/2002.03704v1.pdf
PWC https://paperswithcode.com/paper/try-depth-instead-of-weight-correlations-mean
Repo
Framework
comments powered by Disqus