April 2, 2020

3219 words 16 mins read

Paper Group ANR 338

A Functional EM Algorithm for Panel Count Data with Missing Counts. The Spectral Underpinning of word2vec. Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines. Why Do Line Drawings Work? A Realism Hypothesis. DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation. …

A Functional EM Algorithm for Panel Count Data with Missing Counts


Title	A Functional EM Algorithm for Panel Count Data with Missing Counts
Authors	Alexander Moreno, Zhenke Wu, Jamie Yap, David Wetter, Cho Lam, Inbal Nahum-Shani, Walter Dempsey, James M. Rehg
Abstract	Panel count data is recurrent events data where counts of events are observed at discrete time points. Panel counts naturally describe self-reported behavioral data, and the occurrence of missing or unreliable reports is common. Unfortunately, no prior work has tackled the problem of missingness in this setting. We address this gap in the literature by developing a novel functional EM algorithm that can be used as a wrapper around several popular panel count mean function inference methods when some counts are missing. We provide a novel theoretical analysis of our method showing strong consistency. Extending the methods in (Balakrishnan et al., 2017, Wu et al. 2016), we show that the functional EM algorithm recovers the true mean function of the counting process. We accomplish this by developing alternative regularity conditions for our objective function in order to show convergence of the population EM algorithm. We prove strong consistency of the M-step, thus giving strong consistency guarantees for the finite sample EM algorithm. We present experimental results for synthetic data, synthetic missingness on real data, and a smoking cessation study, where we find that participants may underestimate cigarettes smoked by approximately 18.6% over a 12 day period.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01169v2
PDF	https://arxiv.org/pdf/2003.01169v2.pdf
PWC	https://paperswithcode.com/paper/a-functional-em-algorithm-for-panel-count
Repo
Framework

The Spectral Underpinning of word2vec


Title	The Spectral Underpinning of word2vec
Authors	Ariel Jaffe, Yuval Kluger, Ofir Lindenbaum, Jonathan Patsenker, Erez Peterfreund, Stefan Steinerberger
Abstract	word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12317v1
PDF	https://arxiv.org/pdf/2002.12317v1.pdf
PWC	https://paperswithcode.com/paper/the-spectral-underpinning-of-word2vec
Repo
Framework

Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines


Title	Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines
Authors	David Torpey, Turgay Celik
Abstract	This paper proposes a simple yet effective method for human action recognition in video. The proposed method separately extracts local appearance and motion features using state-of-the-art three-dimensional convolutional neural networks from sampled snippets of a video. These local features are then concatenated to form global representations which are then used to train a linear SVM to perform the action classification using full context of the video, as partial context as used in previous works. The videos undergo two simple proposed preprocessing techniques, optical flow scaling and crop filling. We perform an extensive evaluation on three common benchmark dataset to empirically show the benefit of the SVM, and the two preprocessing steps.
Tasks	Action Classification, Optical Flow Estimation, Temporal Action Localization
Published	2020-02-19
URL	https://arxiv.org/abs/2002.09423v1
PDF	https://arxiv.org/pdf/2002.09423v1.pdf
PWC	https://paperswithcode.com/paper/human-action-recognition-using-local-two
Repo
Framework

Why Do Line Drawings Work? A Realism Hypothesis


Title	Why Do Line Drawings Work? A Realism Hypothesis
Authors	Aaron Hertzmann
Abstract	Why is it that we can recognize object identity and 3D shape from line drawings, even though they do not exist in the natural world? This paper hypothesizes that the human visual system perceives line drawings as if they were approximately realistic images. Moreover, the techniques of line drawing are chosen to accurately convey shape to a human observer. Several implications and variants of this hypothesis are explored.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06260v1
PDF	https://arxiv.org/pdf/2002.06260v1.pdf
PWC	https://paperswithcode.com/paper/why-do-line-drawings-work-a-realism
Repo
Framework

DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation


Title	DymSLAM:4D Dynamic Scene Reconstruction Based on Geometrical Motion Segmentation
Authors	Chenjie Wang, Bin Luo, Yun Zhang, Qing Zhao, Lu Yin, Wei Wang, Xin Su, Yajun Wang, Chengyuan Li
Abstract	Most SLAM algorithms are based on the assumption that the scene is static. However, in practice, most scenes are dynamic which usually contains moving objects, these methods are not suitable. In this paper, we introduce DymSLAM, a dynamic stereo visual SLAM system being capable of reconstructing a 4D (3D + time) dynamic scene with rigid moving objects. The only input of DymSLAM is stereo video, and its output includes a dense map of the static environment, 3D model of the moving objects and the trajectories of the camera and the moving objects. We at first detect and match the interesting points between successive frames by using traditional SLAM methods. Then the interesting points belonging to different motion models (including ego-motion and motion models of rigid moving objects) are segmented by a multi-model fitting approach. Based on the interesting points belonging to the ego-motion, we are able to estimate the trajectory of the camera and reconstruct the static background. The interesting points belonging to the motion models of rigid moving objects are then used to estimate their relative motion models to the camera and reconstruct the 3D models of the objects. We then transform the relative motion to the trajectories of the moving objects in the global reference frame. Finally, we then fuse the 3D models of the moving objects into the 3D map of the environment by considering their motion trajectories to obtain a 4D (3D+time) sequence. DymSLAM obtains information about the dynamic objects instead of ignoring them and is suitable for unknown rigid objects. Hence, the proposed system allows the robot to be employed for high-level tasks, such as obstacle avoidance for dynamic objects. We conducted experiments in a real-world environment where both the camera and the objects were moving in a wide range.
Tasks	Motion Segmentation
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04569v1
PDF	https://arxiv.org/pdf/2003.04569v1.pdf
PWC	https://paperswithcode.com/paper/dymslam4d-dynamic-scene-reconstruction-based
Repo
Framework

Do CNNs Encode Data Augmentations?


Title	Do CNNs Encode Data Augmentations?
Authors	Eddie Yan, Yanping Huang
Abstract	Data augmentations are an important ingredient in the recipe for training robust neural networks, especially in computer vision. A fundamental question is whether neural network features explicitly encode data augmentation transformations. To answer this question, we introduce a systematic approach to investigate which layers of neural networks are the most predictive of augmentation transformations. Our approach uses layer features in pre-trained vision models with minimal additional processing to predict common properties transformed by augmentation (scale, aspect ratio, hue, saturation, contrast, brightness). Surprisingly, neural network features not only predict data augmentation transformations, but they predict many transformations with high accuracy. After validating that neural networks encode features corresponding to augmentation transformations, we show that these features are primarily encoded in the early layers of modern CNNs.
Tasks	Data Augmentation
Published	2020-02-29
URL	https://arxiv.org/abs/2003.08773v1
PDF	https://arxiv.org/pdf/2003.08773v1.pdf
PWC	https://paperswithcode.com/paper/do-cnns-encode-data-augmentations
Repo
Framework

Harnessing Explanations to Bridge AI and Humans


Title	Harnessing Explanations to Bridge AI and Humans
Authors	Vivian Lai, Samuel Carton, Chenhao Tan
Abstract	Machine learning models are increasingly integrated into societally critical applications such as recidivism prediction and medical diagnosis, thanks to their superior predictive power. In these applications, however, full automation is often not desired due to ethical and legal concerns. The research community has thus ventured into developing interpretable methods that explain machine predictions. While these explanations are meant to assist humans in understanding machine predictions and thereby allowing humans to make better decisions, this hypothesis is not supported in many recent studies. To improve human decision-making with AI assistance, we propose future directions for closing the gap between the efficacy of explanations and improvement in human performance.
Tasks	Decision Making, Medical Diagnosis
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07370v1
PDF	https://arxiv.org/pdf/2003.07370v1.pdf
PWC	https://paperswithcode.com/paper/harnessing-explanations-to-bridge-ai-and
Repo
Framework

Application of Deep Neural Networks to assess corporate Credit Rating


Title	Application of Deep Neural Networks to assess corporate Credit Rating
Authors	Parisa Golbayani, Dan Wang, Ionut Florescu
Abstract	Recent literature implements machine learning techniques to assess corporate credit rating based on financial statement reports. In this work, we analyze the performance of four neural network architectures (MLP, CNN, CNN2D, LSTM) in predicting corporate credit rating as issued by Standard and Poor’s. We analyze companies from the energy, financial and healthcare sectors in US. The goal of the analysis is to improve application of machine learning algorithms to credit assessment. To this end, we focus on three questions. First, we investigate if the algorithms perform better when using a selected subset of features, or if it is better to allow the algorithms to select features themselves. Second, is the temporal aspect inherent in financial data important for the results obtained by a machine learning algorithm? Third, is there a particular neural network architecture that consistently outperforms others with respect to input features, sectors and holdout set? We create several case studies to answer these questions and analyze the results using ANOVA and multiple comparison testing procedure.
Tasks
Published	2020-03-04
URL	https://arxiv.org/abs/2003.02334v1
PDF	https://arxiv.org/pdf/2003.02334v1.pdf
PWC	https://paperswithcode.com/paper/application-of-deep-neural-networks-to-assess
Repo
Framework

Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets


Title	Scientific Image Tampering Detection Based On Noise Inconsistencies: A Method And Datasets
Authors	Ziyue Xiang, Daniel E. Acuna
Abstract	Scientific image tampering is a problem that affects not only authors but also the general perception of the research community. Although previous researchers have developed methods to identify tampering in natural images, these methods may not thrive under the scientific setting as scientific images have different statistics, format, quality, and intentions. Therefore, we propose a scientific-image specific tampering detection method based on noise inconsistencies, which is capable of learning and generalizing to different fields of science. We train and test our method on a new dataset of manipulated western blot and microscopy imagery, which aims at emulating problematic images in science. The test results show that our method can detect various types of image manipulation in different scenarios robustly, and it outperforms existing general-purpose image tampering detection schemes. We discuss applications beyond these two types of images and suggest next steps for making detection of problematic images a systematic step in peer review and science in general.
Tasks
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07799v2
PDF	https://arxiv.org/pdf/2001.07799v2.pdf
PWC	https://paperswithcode.com/paper/scientific-image-tampering-detection-based-on
Repo
Framework


Title	RODNet: Object Detection under Severe Conditions Using Vision-Radio Cross-Modal Supervision
Authors	Yizhou Wang, Zhongyu Jiang, Xiangyu Gao, Jenq-Neng Hwang, Guanbin Xing, Hui Liu
Abstract	Radar is usually more robust than the camera in severe autonomous driving scenarios, e.g., weak/strong lighting and bad weather. However, the semantic information from the radio signals is difficult to extract. In this paper, we propose a radio object detection network (RODNet) to detect objects purely from the processed radar data in the format of range-azimuth frequency heatmaps (RAMaps). To train the RODNet, we introduce a cross-modal supervision framework, which utilizes the rich information extracted by a vision-based object 3D localization technique to teach object detection for the radar. In order to train and evaluate our method, we build a new dataset – CRUW, containing synchronized video sequences and RAMaps in various scenarios. After intensive experiments, our RODNet shows favorable object detection performance without the presence of the camera. To the best of our knowledge, this is the first work that can achieve accurate multi-class object detection purely using radar data as the input.
Tasks	Autonomous Driving, Object Detection
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01816v1
PDF	https://arxiv.org/pdf/2003.01816v1.pdf
PWC	https://paperswithcode.com/paper/rodnet-object-detection-under-severe
Repo
Framework

Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection


Title	Self-Guided Adaptation: Progressive Representation Alignment for Domain Adaptive Object Detection
Authors	Zongxian Li, Qixiang Ye, Chong Zhang, Jingjing Liu, Shijian Lu, Yonghong Tian
Abstract	Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models. However, existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift. In this work, we propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains while considering the instantaneous alignment difficulty. The core of SGA is to calculate “hardness” factors for sample pairs indicating domain distance in a kernel space. With the hardness factor, the proposed SGA adaptively indicates the importance of samples and assigns them different constrains. Indicated by hardness factors, Self-Guided Progressive Sampling (SPS) is implemented in an “easy-to-hard” way during model adaptation. Using multi-stage convolutional features, SGA is further aggregated to fully align hierarchical representations of detection models. Extensive experiments on commonly used benchmarks show that SGA improves the state-of-the-art methods with significant margins, while demonstrating the effectiveness on large domain shift.
Tasks	Domain Adaptation, Object Detection, Unsupervised Domain Adaptation
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08777v2
PDF	https://arxiv.org/pdf/2003.08777v2.pdf
PWC	https://paperswithcode.com/paper/self-guided-adaptation-progressive
Repo
Framework

3D medical image segmentation with labeled and unlabeled data using autoencoders at the example of liver segmentation in CT images


Title	3D medical image segmentation with labeled and unlabeled data using autoencoders at the example of liver segmentation in CT images
Authors	Cheryl Sital, Tom Brosch, Dominique Tio, Alexander Raaijmakers, Jürgen Weese
Abstract	Automatic segmentation of anatomical structures with convolutional neural networks (CNNs) constitutes a large portion of research in medical image analysis. The majority of CNN-based methods rely on an abundance of labeled data for proper training. Labeled medical data is often scarce, but unlabeled data is more widely available. This necessitates approaches that go beyond traditional supervised learning and leverage unlabeled data for segmentation tasks. This work investigates the potential of autoencoder-extracted features to improve segmentation with a CNN. Two strategies were considered. First, transfer learning where pretrained autoencoder features were used as initialization for the convolutional layers in the segmentation network. Second, multi-task learning where the tasks of segmentation and feature extraction, by means of input reconstruction, were learned and optimized simultaneously. A convolutional autoencoder was used to extract features from unlabeled data and a multi-scale, fully convolutional CNN was used to perform the target task of 3D liver segmentation in CT images. For both strategies, experiments were conducted with varying amounts of labeled and unlabeled training data. The proposed learning strategies improved results in $75%$ of the experiments compared to training from scratch and increased the dice score by up to $0.040$ and $0.024$ for a ratio of unlabeled to labeled training data of about $32 : 1$ and $12.5 : 1$, respectively. The results indicate that both training strategies are more effective with a large ratio of unlabeled to labeled training data.
Tasks	Liver Segmentation, Medical Image Segmentation, Multi-Task Learning, Semantic Segmentation, Transfer Learning
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07923v1
PDF	https://arxiv.org/pdf/2003.07923v1.pdf
PWC	https://paperswithcode.com/paper/3d-medical-image-segmentation-with-labeled
Repo
Framework

Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling


Title	Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling
Authors	Mu Yuan, Lan Zhang, Xiang-Yang Li, Hui Xiong
Abstract	Labeling data (e.g., labeling the people, objects, actions and scene in images) comprehensively and efficiently is a widely needed but challenging task. Numerous models were proposed to label various data and many approaches were designed to enhance the ability of deep learning models or accelerate them. Unfortunately, a single machine-learning model is not powerful enough to extract various semantic information from data. Given certain applications, such as image retrieval platforms and photo album management apps, it is often required to execute a collection of models to obtain sufficient labels. With limited computing resources and stringent delay, given a data stream and a collection of applicable resource-hungry deep-learning models, we design a novel approach to adaptively schedule a subset of these models to execute on each data item, aiming to maximize the value of the model output (e.g., the number of high-confidence labels). Achieving this lofty goal is nontrivial since a model’s output on any data item is content-dependent and unknown until we execute it. To tackle this, we propose an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of unexecuted models by mining semantic relationship among diverse models, and 2) two heuristic algorithms to adaptively schedule the model execution order under a deadline or deadline-memory constraints respectively. The proposed framework doesn’t require any prior knowledge of the data, which works as a powerful complement to existing model optimization technologies. We conduct extensive evaluations on five diverse image datasets and 30 popular image labeling models to demonstrate the effectiveness of our design: our design could save around 53% execution time without loss of any valuable labels.
Tasks	Image Retrieval
Published	2020-02-08
URL	https://arxiv.org/abs/2002.05520v1
PDF	https://arxiv.org/pdf/2002.05520v1.pdf
PWC	https://paperswithcode.com/paper/comprehensive-and-efficient-data-labeling-via
Repo
Framework

Liver Segmentation in Abdominal CT Images via Auto-Context Neural Network and Self-Supervised Contour Attention


Title	Liver Segmentation in Abdominal CT Images via Auto-Context Neural Network and Self-Supervised Contour Attention
Authors	Minyoung Chung, Jingyu Lee, Jeongjin Lee, Yeong-Gil Shin
Abstract	Accurate image segmentation of the liver is a challenging problem owing to its large shape variability and unclear boundaries. Although the applications of fully convolutional neural networks (CNNs) have shown groundbreaking results, limited studies have focused on the performance of generalization. In this study, we introduce a CNN for liver segmentation on abdominal computed tomography (CT) images that shows high generalization performance and accuracy. To improve the generalization performance, we initially propose an auto-context algorithm in a single CNN. The proposed auto-context neural network exploits an effective high-level residual estimation to obtain the shape prior. Identical dual paths are effectively trained to represent mutual complementary features for an accurate posterior analysis of a liver. Further, we extend our network by employing a self-supervised contour scheme. We trained sparse contour features by penalizing the ground-truth contour to focus more contour attentions on the failures. The experimental results show that the proposed network results in better accuracy when compared to the state-of-the-art networks by reducing 10.31% of the Hausdorff distance. We used 180 abdominal CT images for training and validation. Two-fold cross-validation is presented for a comparison with the state-of-the-art neural networks. Novel multiple N-fold cross-validations are conducted to verify the performance of generalization. The proposed network showed the best generalization performance among the networks. Additionally, we present a series of ablation experiments that comprehensively support the importance of the underlying concepts.
Tasks	Computed Tomography (CT), Liver Segmentation, Semantic Segmentation
Published	2020-02-14
URL	https://arxiv.org/abs/2002.05895v1
PDF	https://arxiv.org/pdf/2002.05895v1.pdf
PWC	https://paperswithcode.com/paper/liver-segmentation-in-abdominal-ct-images-via
Repo
Framework

Try Depth Instead of Weight Correlations: Mean-field is a Less Restrictive Assumption for Deeper Networks


Title	Try Depth Instead of Weight Correlations: Mean-field is a Less Restrictive Assumption for Deeper Networks
Authors	Sebastian Farquhar, Lewis Smith, Yarin Gal
Abstract	We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive. We argue mathematically that full-covariance approximations only improve the ELBO if they improve the expected log-likelihood. We further show that deeper mean-field networks are able to express predictive distributions approximately equivalent to shallower full-covariance networks. We validate these observations empirically, demonstrating that deeper models decrease the divergence between diagonal- and full-covariance Gaussian fits to the true posterior.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03704v1
PDF	https://arxiv.org/pdf/2002.03704v1.pdf
PWC	https://paperswithcode.com/paper/try-depth-instead-of-weight-correlations-mean
Repo
Framework