October 16, 2019

3316 words 16 mins read

Paper Group ANR 1156

Paper Group ANR 1156

Learning Self-Imitating Diverse Policies. 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild. SDCNet: Video Prediction Using Spatially-Displaced Convolution. Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry. A Deep-Learning-Based Geological Parameterization for History Matching Complex Models. Re …

Learning Self-Imitating Diverse Policies

Title Learning Self-Imitating Diverse Policies
Authors Tanmay Gangwani, Qiang Liu, Jian Peng
Abstract The success of popular algorithms for deep reinforcement learning, such as policy-gradients and Q-learning, relies heavily on the availability of an informative reward signal at each timestep of the sequential decision-making process. When rewards are only sparsely available during an episode, or a rewarding feedback is provided only after episode termination, these algorithms perform sub-optimally due to the difficultly in credit assignment. Alternatively, trajectory-based policy optimization methods, such as cross-entropy method and evolution strategies, do not require per-timestep rewards, but have been found to suffer from high sample complexity by completing forgoing the temporal nature of the problem. Improving the efficiency of RL algorithms in real-world problems with sparse or episodic rewards is therefore a pressing need. In this work, we introduce a self-imitation learning algorithm that exploits and explores well in the sparse and episodic reward settings. We view each policy as a state-action visitation distribution and formulate policy optimization as a divergence minimization problem. We show that with Jensen-Shannon divergence, this divergence minimization problem can be reduced into a policy-gradient algorithm with shaped rewards learned from experience replays. Experimental results indicate that our algorithm works comparable to existing algorithms in environments with dense rewards, and significantly better in environments with sparse and episodic rewards. We then discuss limitations of self-imitation learning, and propose to solve them by using Stein variational policy gradient descent with the Jensen-Shannon kernel to learn multiple diverse policies. We demonstrate its effectiveness on a challenging variant of continuous-control MuJoCo locomotion tasks.
Tasks Continuous Control, Decision Making, Imitation Learning, Policy Gradient Methods, Q-Learning
Published 2018-05-25
URL http://arxiv.org/abs/1805.10309v2
PDF http://arxiv.org/pdf/1805.10309v2.pdf
PWC https://paperswithcode.com/paper/learning-self-imitating-diverse-policies
Repo
Framework

3D Pose Estimation and 3D Model Retrieval for Objects in the Wild

Title 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
Authors Alexander Grabner, Peter M. Roth, Vincent Lepetit
Abstract We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outperforms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accurately represent the geometry of objects in RGB images. For this purpose, we render depth images from 3D models under our predicted pose and match learned image descriptors of RGB images against those of rendered depth images using a CNN-based multi-view metric learning approach. In this way, we are the first to report quantitative results for 3D model retrieval on Pascal3D+, where our method chooses the same models as human annotators for 50% of the validation images on average. In addition, we show that our method, which was trained purely on Pascal3D+, retrieves rich and accurate 3D models from ShapeNet given RGB images of objects in the wild.
Tasks 3D Pose Estimation, Metric Learning, Pose Estimation
Published 2018-03-30
URL http://arxiv.org/abs/1803.11493v1
PDF http://arxiv.org/pdf/1803.11493v1.pdf
PWC https://paperswithcode.com/paper/3d-pose-estimation-and-3d-model-retrieval-for
Repo
Framework

SDCNet: Video Prediction Using Spatially-Displaced Convolution

Title SDCNet: Video Prediction Using Spatially-Displaced Convolution
Authors Fitsum A. Reda, Guilin Liu, Kevin J. Shih, Robert Kirby, Jon Barker, David Tarjan, Andrew Tao, Bryan Catanzaro
Abstract We present an approach for high-resolution video frame prediction by conditioning on both past frames and past optical flows. Previous approaches rely on resampling past frames, guided by a learned future optical flow, or on direct generation of pixels. Resampling based on flow is insufficient because it cannot deal with disocclusions. Generative models currently lead to blurry results. Recent approaches synthesis a pixel by convolving input patches with a predicted kernel. However, their memory requirement increases with kernel size. Here, we spatially-displaced convolution (SDC) module for video frame prediction. We learn a motion vector and a kernel for each pixel and synthesize a pixel by applying the kernel at a displaced location in the source image, defined by the predicted motion vector. Our approach inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages. We train our model on 428K unlabelled 1080p video game frames. Our approach produces state-of-the-art results, achieving an SSIM score of 0.904 on high-definition YouTube-8M videos, 0.918 on Caltech Pedestrian videos. Our model handles large motion effectively and synthesizes crisp frames with consistent motion.
Tasks Optical Flow Estimation, Video Prediction
Published 2018-11-02
URL http://arxiv.org/abs/1811.00684v1
PDF http://arxiv.org/pdf/1811.00684v1.pdf
PWC https://paperswithcode.com/paper/sdcnet-video-prediction-using-spatially
Repo
Framework

Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry

Title Modeling Varying Camera-IMU Time Offset in Optimization-Based Visual-Inertial Odometry
Authors Yonggen Ling, Linchao Bao, Zequn Jie, Fengming Zhu, Ziyang Li, Shanmin Tang, Yongsheng Liu, Wei Liu, Tong Zhang
Abstract Combining cameras and inertial measurement units (IMUs) has been proven effective in motion tracking, as these two sensing modalities offer complementary characteristics that are suitable for fusion. While most works focus on global-shutter cameras and synchronized sensor measurements, consumer-grade devices are mostly equipped with rolling-shutter cameras and suffer from imperfect sensor synchronization. In this work, we propose a nonlinear optimization-based monocular visual inertial odometry (VIO) with varying camera-IMU time offset modeled as an unknown variable. Our approach is able to handle the rolling-shutter effects and imperfect sensor synchronization in a unified way. Additionally, we introduce an efficient algorithm based on dynamic programming and red-black tree to speed up IMU integration over variable-length time intervals during the optimization. An uncertainty-aware initialization is also presented to launch the VIO robustly. Comparisons with state-of-the-art methods on the Euroc dataset and mobile phone data are shown to validate the effectiveness of our approach.
Tasks
Published 2018-10-12
URL http://arxiv.org/abs/1810.05456v1
PDF http://arxiv.org/pdf/1810.05456v1.pdf
PWC https://paperswithcode.com/paper/modeling-varying-camera-imu-time-offset-in
Repo
Framework

A Deep-Learning-Based Geological Parameterization for History Matching Complex Models

Title A Deep-Learning-Based Geological Parameterization for History Matching Complex Models
Authors Yimin Liu, Wenyue Sun, Louis J. Durlofsky
Abstract A new low-dimensional parameterization based on principal component analysis (PCA) and convolutional neural networks (CNN) is developed to represent complex geological models. The CNN-PCA method is inspired by recent developments in computer vision using deep learning. CNN-PCA can be viewed as a generalization of an existing optimization-based PCA (O-PCA) method. Both CNN-PCA and O-PCA entail post-processing a PCA model to better honor complex geological features. In CNN-PCA, rather than use a histogram-based regularization as in O-PCA, a new regularization involving a set of metrics for multipoint statistics is introduced. The metrics are based on summary statistics of the nonlinear filter responses of geological models to a pre-trained deep CNN. In addition, in the CNN-PCA formulation presented here, a convolutional neural network is trained as an explicit transform function that can post-process PCA models quickly. CNN-PCA is shown to provide both unconditional and conditional realizations that honor the geological features present in reference SGeMS geostatistical realizations for a binary channelized system. Flow statistics obtained through simulation of random CNN-PCA models closely match results for random SGeMS models for a demanding case in which O-PCA models lead to significant discrepancies. Results for history matching are also presented. In this assessment CNN-PCA is applied with derivative-free optimization, and a subspace randomized maximum likelihood method is used to provide multiple posterior models. Data assimilation and significant uncertainty reduction are achieved for existing wells, and physically reasonable predictions are also obtained for new wells. Finally, the CNN-PCA method is extended to a more complex non-stationary bimodal deltaic fan system, and is shown to provide high-quality realizations for this challenging example.
Tasks
Published 2018-07-07
URL http://arxiv.org/abs/1807.02716v1
PDF http://arxiv.org/pdf/1807.02716v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-based-geological
Repo
Framework

Remote sensing image regression for heterogeneous change detection

Title Remote sensing image regression for heterogeneous change detection
Authors Luigi T. Luppino, Filippo M. Bianchi, Gabriele Moser, Stian N. Anfinsen
Abstract Change detection in heterogeneous multitemporal satellite images is an emerging topic in remote sensing. In this paper we propose a framework, based on image regression, to perform change detection in heterogeneous multitemporal satellite images, which has become a main topic in remote sensing. Our method learns a transformation to map the first image to the domain of the other image, and vice versa. Four regression methods are selected to carry out the transformation: Gaussian processes, support vector machines, random forests, and a recently proposed kernel regression method called homogeneous pixel transformation. To evaluate not only potentials and limitations of our framework, but also the pros and cons of each regression method, we perform experiments on two data sets. The results indicates that random forests achieve good performance, are fast and robust to hyperparameters, whereas the homogeneous pixel transformation method can achieve better accuracy at the cost of a higher complexity.
Tasks Gaussian Processes
Published 2018-07-31
URL http://arxiv.org/abs/1807.11766v1
PDF http://arxiv.org/pdf/1807.11766v1.pdf
PWC https://paperswithcode.com/paper/remote-sensing-image-regression-for
Repo
Framework

Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures

Title Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures
Authors Prem Seetharaman, Gordon Wichern, Jonathan Le Roux, Bryan Pardo
Abstract Separating an audio scene into isolated sources is a fundamental problem in computer audition, analogous to image segmentation in visual scene analysis. Source separation systems based on deep learning are currently the most successful approaches for solving the underdetermined separation problem, where there are more sources than channels. Traditionally, such systems are trained on sound mixtures where the ground truth decomposition is already known. Since most real-world recordings do not have such a decomposition available, this limits the range of mixtures one can train on, and the range of mixtures the learned models may successfully separate. In this work, we use a simple blind spatial source separation algorithm to generate estimated decompositions of stereo mixtures. These estimates, together with a weighting scheme in the time-frequency domain, based on confidence in the separation quality, are used to train a deep learning model that can be used for single-channel separation, where no source direction information is available. This demonstrates how a simple cue such as the direction of origin of source can be used to bootstrap a model for source separation that can be used in situations where that cue is not available.
Tasks Semantic Segmentation, Unsupervised Spatial Clustering
Published 2018-11-06
URL http://arxiv.org/abs/1811.02130v1
PDF http://arxiv.org/pdf/1811.02130v1.pdf
PWC https://paperswithcode.com/paper/bootstrapping-single-channel-source
Repo
Framework

Assessing Shape Bias Property of Convolutional Neural Networks

Title Assessing Shape Bias Property of Convolutional Neural Networks
Authors Hossein Hosseini, Baicen Xiao, Mayoore Jaiswal, Radha Poovendran
Abstract It is known that humans display “shape bias” when classifying new items, i.e., they prefer to categorize objects based on their shape rather than color. Convolutional Neural Networks (CNNs) are also designed to take into account the spatial structure of image data. In fact, experiments on image datasets, consisting of triples of a probe image, a shape-match and a color-match, have shown that one-shot learning models display shape bias as well. In this paper, we examine the shape bias property of CNNs. In order to conduct large scale experiments, we propose using the model accuracy on images with reversed brightness as a metric to evaluate the shape bias property. Such images, called negative images, contain objects that have the same shape as original images, but with different colors. Through extensive systematic experiments, we investigate the role of different factors, such as training data, model architecture, initialization and regularization techniques, on the shape bias property of CNNs. We show that it is possible to design different CNNs that achieve similar accuracy on original images, but perform significantly different on negative images, suggesting that CNNs do not intrinsically display shape bias. We then show that CNNs are able to learn and generalize the structures, when the model is properly initialized or data is properly augmented, and if batch normalization is used.
Tasks One-Shot Learning
Published 2018-03-21
URL http://arxiv.org/abs/1803.07739v1
PDF http://arxiv.org/pdf/1803.07739v1.pdf
PWC https://paperswithcode.com/paper/assessing-shape-bias-property-of
Repo
Framework

A Deep Ensemble Framework for Fake News Detection and Classification

Title A Deep Ensemble Framework for Fake News Detection and Classification
Authors Arjun Roy, Kingshuk Basak, Asif Ekbal, Pushpak Bhattacharyya
Abstract Fake news, rumor, incorrect information, and misinformation detection are nowadays crucial issues as these might have serious consequences for our social fabrics. The rate of such information is increasing rapidly due to the availability of enormous web information sources including social media feeds, news blogs, online newspapers etc. In this paper, we develop various deep learning models for detecting fake news and classifying them into the pre-defined fine-grained categories. At first, we develop models based on Convolutional Neural Network (CNN) and Bi-directional Long Short Term Memory (Bi-LSTM) networks. The representations obtained from these two models are fed into a Multi-layer Perceptron Model (MLP) for the final classification. Our experiments on a benchmark dataset show promising results with an overall accuracy of 44.87%, which outperforms the current state of the art.
Tasks Fake News Detection
Published 2018-11-12
URL http://arxiv.org/abs/1811.04670v1
PDF http://arxiv.org/pdf/1811.04670v1.pdf
PWC https://paperswithcode.com/paper/a-deep-ensemble-framework-for-fake-news
Repo
Framework

Multi-Head Attention with Disagreement Regularization

Title Multi-Head Attention with Disagreement Regularization
Authors Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, Tong Zhang
Abstract Multi-head attention is appealing for the ability to jointly attend to information from different representation subspaces at different positions. In this work, we introduce a disagreement regularization to explicitly encourage the diversity among multiple attention heads. Specifically, we propose three types of disagreement regularization, which respectively encourage the subspace, the attended positions, and the output representation associated with each attention head to be different from other heads. Experimental results on widely-used WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness and universality of the proposed approach.
Tasks
Published 2018-10-24
URL http://arxiv.org/abs/1810.10183v1
PDF http://arxiv.org/pdf/1810.10183v1.pdf
PWC https://paperswithcode.com/paper/multi-head-attention-with-disagreement
Repo
Framework

Image Processing on IOPA Radiographs: A comprehensive case study on Apical Periodontitis

Title Image Processing on IOPA Radiographs: A comprehensive case study on Apical Periodontitis
Authors Diganta Misra, Vanshika Arora
Abstract With the recent advancements in Image Processing Techniques and development of new robust computer vision algorithms, new areas of research within Medical Diagnosis and Biomedical Engineering are picking up pace. This paper provides a comprehensive in-depth case study of Image Processing, Feature Extraction and Analysis of Apical Periodontitis diagnostic cases in IOPA (Intra Oral Peri-Apical) Radiographs, a common case in oral diagnostic pipeline. This paper provides a detailed analytical approach towards improving the diagnostic procedure with improved and faster results with higher accuracy targeting to eliminate True Negative and False Positive cases.
Tasks Medical Diagnosis
Published 2018-12-23
URL http://arxiv.org/abs/1812.09693v2
PDF http://arxiv.org/pdf/1812.09693v2.pdf
PWC https://paperswithcode.com/paper/image-processing-on-iopa-radiographs-a
Repo
Framework

Parity Queries for Binary Classification

Title Parity Queries for Binary Classification
Authors Hye Won Chung, Ji Oon Lee, Doyeon Kim, Alfred O. Hero
Abstract Consider a query-based data acquisition problem that aims to recover the values of $k$ binary variables from parity (XOR) measurements of chosen subsets of the variables. Assume the response model where only a randomly selected subset of the measurements is received. We propose a method for designing a sequence of queries so that the variables can be identified with high probability using as few ($n$) measurements as possible. We define the query difficulty $\bar{d}$ as the average size of the query subsets and the sample complexity $n$ as the minimum number of measurements required to attain a given recovery accuracy. We obtain fundamental trade-offs between recovery accuracy, query difficulty, and sample complexity. In particular, the necessary and sufficient sample complexity required for recovering all $k$ variables with high probability is $n = c_0 \max{k, (k \log k)/\bar{d}}$ and the sample complexity for recovering a fixed proportion $(1-\delta)k$ of the variables for $\delta=o(1)$ is $n = c_1\max{k, (k \log(1/\delta))/\bar{d}}$, where $c_0, c_1>0$.
Tasks
Published 2018-09-04
URL https://arxiv.org/abs/1809.00901v2
PDF https://arxiv.org/pdf/1809.00901v2.pdf
PWC https://paperswithcode.com/paper/parity-crowdsourcing-for-cooperative-labeling
Repo
Framework

Visual Social Relationship Recognition

Title Visual Social Relationship Recognition
Authors Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
Abstract Social relationships form the basis of social structure of humans. Developing computational models to understand social relationships from visual data is essential for building intelligent machines that can better interact with humans in a social environment. In this work, we study the problem of visual social relationship recognition in images. We propose a Dual-Glance model for social relationship recognition, where the first glance fixates at the person of interest and the second glance deploys attention mechanism to exploit contextual cues. To enable this study, we curated a large scale People in Social Context (PISC) dataset, which comprises of 23,311 images and 79,244 person pairs with annotated social relationships. Since visually identifying social relationship bears certain degree of uncertainty, we further propose an Adaptive Focal Loss to leverage the ambiguous annotations for more effective learning. We conduct extensive experiments to quantitatively and qualitatively demonstrate the efficacy of our proposed method, which yields state-of-the-art performance on social relationship recognition.
Tasks Visual Social Relationship Recognition
Published 2018-12-13
URL http://arxiv.org/abs/1812.05917v1
PDF http://arxiv.org/pdf/1812.05917v1.pdf
PWC https://paperswithcode.com/paper/visual-social-relationship-recognition
Repo
Framework

Deep Residual Network for Off-Resonance Artifact Correction with Application to Pediatric Body Magnetic Resonance Angiography with 3D Cones

Title Deep Residual Network for Off-Resonance Artifact Correction with Application to Pediatric Body Magnetic Resonance Angiography with 3D Cones
Authors David Y Zeng, Jamil Shaikh, Dwight G Nishimura, Shreyas S Vasanawala, Joseph Y Cheng
Abstract Purpose: Off-resonance artifact correction by deep-learning, to facilitate rapid pediatric body imaging with a scan time efficient 3D cones trajectory. Methods: A residual convolutional neural network to correct off-resonance artifacts (Off-ResNet) was trained with a prospective study of 30 pediatric magnetic resonance angiography exams. Each exam acquired a short-readout scan (1.18 ms +- 0.38) and a long-readout scan (3.35 ms +- 0.74) at 3T. Short-readout scans, with longer scan times but negligible off-resonance blurring, were used as reference images and augmented with additional off-resonance for supervised training examples. Long-readout scans, with greater off-resonance artifacts but shorter scan time, were corrected by autofocus and Off-ResNet and compared to short-readout scans by normalized root-mean-square error (NRMSE), structural similarity index (SSIM), and peak signal-to-noise ratio (PSNR). Scans were also compared by scoring on eight anatomical features by two radiologists, using analysis of variance with post-hoc Tukey’s test. Reader agreement was determined with intraclass correlation. Results: Long-readout scans were on average 59.3% shorter than short-readout scans. Images from Off-ResNet had superior NRMSE, SSIM, and PSNR compared to uncorrected images across +-1kHz off-resonance (P<0.01). The proposed method had superior NRMSE over -677Hz to +1kHz and superior SSIM and PSNR over +-1kHz compared to autofocus (P<0.01). Radiologic scoring demonstrated that long-readout scans corrected with Off-ResNet were non-inferior to short-readout scans (P<0.01). Conclusion: The proposed method can correct off-resonance artifacts from rapid long-readout 3D cones scans to a non-inferior image quality compared to diagnostically standard short-readout scans.
Tasks
Published 2018-09-28
URL http://arxiv.org/abs/1810.00072v1
PDF http://arxiv.org/pdf/1810.00072v1.pdf
PWC https://paperswithcode.com/paper/deep-residual-network-for-off-resonance
Repo
Framework

Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network

Title Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network
Authors Xinjing Cheng, Peng Wang, Ruigang Yang
Abstract Depth estimation from a single image is a fundamental problem in computer vision. In this paper, we propose a simple yet effective convolutional spatial propagation network (CSPN) to learn the affinity matrix for depth prediction. Specifically, we adopt an efficient linear propagation model, where the propagation is performed with a manner of recurrent convolutional operation, and the affinity among neighboring pixels is learned through a deep convolutional neural network (CNN). We apply the designed CSPN to two depth estimation tasks given a single image: (1) To refine the depth output from state-of-the-art (SOTA) existing methods; and (2) to convert sparse depth samples to a dense depth map by embedding the depth samples within the propagation procedure. The second task is inspired by the availability of LIDARs that provides sparse but accurate depth measurements. We experimented the proposed CSPN over two popular benchmarks for depth estimation, i.e. NYU v2 and KITTI, where we show that our proposed approach improves in not only quality (e.g., 30% more reduction in depth error), but also speed (e.g., 2 to 5 times faster) than prior SOTA methods.
Tasks Depth Estimation
Published 2018-08-01
URL http://arxiv.org/abs/1808.00150v1
PDF http://arxiv.org/pdf/1808.00150v1.pdf
PWC https://paperswithcode.com/paper/depth-estimation-via-affinity-learned-with
Repo
Framework
comments powered by Disqus