February 1, 2020

3006 words 15 mins read

Paper Group AWR 306

Paper Group AWR 306

Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm. MIOpen: An Open Source Library For Deep Learning Primitives. Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep. LVIS: A Dataset for Large Vocabulary Instance Segmentation. Multimodal Transformer Networks for End-to-End Vi …

Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm

Title Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm
Authors Daniel Barath, Jiri Matas
Abstract The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. The method interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization. Due to exploring the data progressively, the method has several beneficial properties compared with the state-of-the-art. First, a clear criterion, adopted from RANSAC, controls the termination and stops the algorithm when the probability of finding a new model with a reasonable number of inliers falls below a threshold. Second, Prog-X is an any-time algorithm. Thus, whenever is interrupted, e.g. due to a time limit, the returned instances cover real and, likely, the most dominant ones. The method is superior to the state-of-the-art in terms of accuracy in both synthetic experiments and on publicly available real-world datasets for homography, two-view motion, and motion segmentation.
Tasks Motion Segmentation
Published 2019-06-05
URL https://arxiv.org/abs/1906.02290v1
PDF https://arxiv.org/pdf/1906.02290v1.pdf
PWC https://paperswithcode.com/paper/progressive-x-efficient-anytime-multi-model
Repo https://github.com/danini/progressive-x
Framework none

MIOpen: An Open Source Library For Deep Learning Primitives

Title MIOpen: An Open Source Library For Deep Learning Primitives
Authors Jehandad Khan, Paul Fultz, Artem Tamazov, Daniel Lowell, Chao Liu, Michael Melesse, Murali Nandhimandalam, Kamil Nasyrov, Ilya Perminov, Tejash Shah, Vasilii Filippov, Jing Zhang, Jing Zhou, Bragadeesh Natarajan, Mayank Daga
Abstract Deep Learning has established itself to be a common occurrence in the business lexicon. The unprecedented success of deep learning in recent years can be attributed to: abundance of data, availability of gargantuan compute capabilities offered by GPUs, and adoption of open-source philosophy by the researchers and industry. Deep neural networks can be decomposed into a series of different operators. MIOpen, AMD’s open-source deep learning primitives library for GPUs, provides highly optimized implementations of such operators, shielding researchers from internal implementation details and hence, accelerating the time to discovery. This paper introduces MIOpen and provides details about the internal workings of the library and supported features. MIOpen innovates on several fronts, such as implementing fusion to optimize for memory bandwidth and GPU launch overheads, providing an auto-tuning infrastructure to overcome the large design space of problem configurations, and implementing different algorithms to optimize convolutions for different filter and input sizes. MIOpen is one of the first libraries to publicly support the bfloat16 data-type for convolutions, allowing efficient training at lower precision without the loss of accuracy.
Tasks
Published 2019-09-30
URL https://arxiv.org/abs/1910.00078v1
PDF https://arxiv.org/pdf/1910.00078v1.pdf
PWC https://paperswithcode.com/paper/miopen-an-open-source-library-for-deep
Repo https://github.com/ROCmSoftwarePlatform/MIOpen
Framework none

Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep

Title Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep
Authors Jean Feng, Scott Emerson, Noah Simon
Abstract Successful deployment of machine learning algorithms in healthcare requires careful assessments of their performance and safety. To date, the FDA approves locked algorithms prior to marketing and requires future updates to undergo separate premarket reviews. However, this negates a key feature of machine learning–the ability to learn from a growing dataset and improve over time. This paper frames the design of an approval policy, which we refer to as an automatic algorithmic change protocol (aACP), as an online hypothesis testing problem. As this process has obvious analogy with noninferiority testing of new drugs, we investigate how repeated testing and adoption of modifications might lead to gradual deterioration in prediction accuracy, also known as ``biocreep’’ in the drug development literature. We consider simple policies that one might consider but do not necessarily offer any error-rate guarantees, as well as policies that do provide error-rate control. For the latter, we define two online error-rates appropriate for this context: Bad Approval Count (BAC) and Bad Approval and Benchmark Ratios (BABR). We control these rates in the simple setting of a constant population and data source using policies aACP-BAC and aACP-BABR, which combine alpha-investing, group-sequential, and gate-keeping methods. In simulation studies, bio-creep regularly occurred when using policies with no error-rate guarantees, whereas aACP-BAC and -BABR controlled the rate of bio-creep without substantially impacting our ability to approve beneficial modifications. |
Tasks
Published 2019-12-28
URL https://arxiv.org/abs/1912.12413v1
PDF https://arxiv.org/pdf/1912.12413v1.pdf
PWC https://paperswithcode.com/paper/approval-policies-for-modifications-to
Repo https://github.com/jjfeng/aACP
Framework none

LVIS: A Dataset for Large Vocabulary Instance Segmentation

Title LVIS: A Dataset for Large Vocabulary Instance Segmentation
Authors Agrim Gupta, Piotr Dollár, Ross Girshick
Abstract Progress on object detection is enabled by datasets that focus the research community’s attention on open challenges. This process led us from simple images to complex scenes and from bounding boxes to segmentation masks. In this work, we introduce LVIS (pronounced `el-vis’): a new dataset for Large Vocabulary Instance Segmentation. We plan to collect ~2 million high-quality instance segmentation masks for over 1000 entry-level object categories in 164k images. Due to the Zipfian distribution of categories in natural images, LVIS naturally has a long tail of categories with few training samples. Given that state-of-the-art deep learning methods for object detection perform poorly in the low-sample regime, we believe that our dataset poses an important and exciting new scientific challenge. LVIS is available at http://www.lvisdataset.org. |
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2019-08-08
URL https://arxiv.org/abs/1908.03195v2
PDF https://arxiv.org/pdf/1908.03195v2.pdf
PWC https://paperswithcode.com/paper/lvis-a-dataset-for-large-vocabulary-instance-1
Repo https://github.com/lvis-dataset/lvis-api
Framework none

Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems

Title Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Authors Hung Le, Doyen Sahoo, Nancy F. Chen, Steven C. H. Hoi
Abstract Developing Video-Grounded Dialogue Systems (VGDS), where a dialogue is conducted based on visual and audio aspects of a given video, is significantly more challenging than traditional image or text-grounded dialogue systems because (1) feature space of videos span across multiple picture frames, making it difficult to obtain semantic information; and (2) a dialogue agent must perceive and process information from different modalities (audio, video, caption, etc.) to obtain a comprehensive understanding. Most existing work is based on RNNs and sequence-to-sequence architectures, which are not very effective for capturing complex long-term dependencies (like in videos). To overcome this, we propose Multimodal Transformer Networks (MTN) to encode videos and incorporate information from different modalities. We also propose query-aware attention through an auto-encoder to extract query-aware features from non-text modalities. We develop a training procedure to simulate token-level decoding to improve the quality of generated responses during inference. We get state of the art performance on Dialogue System Technology Challenge 7 (DSTC7). Our model also generalizes to another multimodal visual-grounded dialogue task, and obtains promising performance. We implemented our models using PyTorch and the code is released at https://github.com/henryhungle/MTN.
Tasks
Published 2019-07-02
URL https://arxiv.org/abs/1907.01166v1
PDF https://arxiv.org/pdf/1907.01166v1.pdf
PWC https://paperswithcode.com/paper/multimodal-transformer-networks-for-end-to
Repo https://github.com/henryhungle/MTN
Framework pytorch

Gaussian implementation of the multi-Bernoulli mixture filter

Title Gaussian implementation of the multi-Bernoulli mixture filter
Authors Ángel F. García-Fernández, Yuxuan Xia, Karl Granström, Lennart Svensson, Jason L. Williams
Abstract This paper presents the Gaussian implementation of the multi-Bernoulli mixture (MBM) filter. The MBM filter provides the filtering (multi-target) density for the standard dynamic and radar measurement models when the birth model is multi-Bernoulli or multi-Bernoulli mixture. Under linear/Gaussian models, the single target densities of the MBM mixture admit Gaussian closed-form expressions. Murty’s algorithm is used to select the global hypotheses with highest weights. The MBM filter is compared with other algorithms in the literature via numerical simulations.
Tasks
Published 2019-08-23
URL https://arxiv.org/abs/1908.08819v1
PDF https://arxiv.org/pdf/1908.08819v1.pdf
PWC https://paperswithcode.com/paper/gaussian-implementation-of-the-multi
Repo https://github.com/Agarciafernandez/MTT
Framework none

Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

Title Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices
Authors Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
Abstract This paper describes a versatile method that accelerates multichannel source separation methods based on full-rank spatial modeling. A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain. One of the most successful examples of this approach is multichannel nonnegative matrix factorization (MNMF) based on a full-rank spatial model and a low-rank source model. MNMF, however, is computationally expensive and often works poorly due to the difficulty of estimating the unconstrained full-rank SCMs. Instead of restricting the SCMs to rank-1 matrices with the severe loss of the spatial modeling ability as in independent low-rank matrix analysis (ILRMA), we restrict the SCMs of each frequency bin to jointly-diagonalizable but still full-rank matrices. For such a fast version of MNMF, we propose a computationally-efficient and convergence-guaranteed algorithm that is similar in form to that of ILRMA. Similarly, we propose a fast version of a state-of-the-art speech enhancement method based on a deep speech model and a low-rank noise model. Experimental results showed that the fast versions of MNMF and the deep speech enhancement method were several times faster and performed even better than the original versions of those methods, respectively.
Tasks Speech Enhancement
Published 2019-03-08
URL http://arxiv.org/abs/1903.03237v1
PDF http://arxiv.org/pdf/1903.03237v1.pdf
PWC https://paperswithcode.com/paper/fast-multichannel-source-separation-based-on
Repo https://github.com/sekiguchi92/eusipco2019
Framework none

Planar Prior Assisted PatchMatch Multi-View Stereo

Title Planar Prior Assisted PatchMatch Multi-View Stereo
Authors Qingshan Xu, Wenbing Tao
Abstract The completeness of 3D models is still a challenging problem in multi-view stereo (MVS) due to the unreliable photometric consistency in low-textured areas. Since low-textured areas usually exhibit strong planarity, planar models are advantageous to the depth estimation of low-textured areas. On the other hand, PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme. By taking advantage of planar models and PatchMatch multi-view stereo, we propose a planar prior assisted PatchMatch multi-view stereo framework in this paper. In detail, we utilize a probabilistic graphical model to embed planar models into PatchMatch multi-view stereo and contribute a novel multi-view aggregated matching cost. This novel cost takes both photometric consistency and planar compatibility into consideration, making it suited for the depth estimation of both non-planar and planar regions. Experimental results demonstrate that our method can efficiently recover the depth information of extremely low-textured areas, thus obtaining high complete 3D models and achieving state-of-the-art performance.
Tasks Depth Estimation
Published 2019-12-26
URL https://arxiv.org/abs/1912.11744v1
PDF https://arxiv.org/pdf/1912.11744v1.pdf
PWC https://paperswithcode.com/paper/planar-prior-assisted-patchmatch-multi-view
Repo https://github.com/GhiXu/ACMP
Framework none

Neural Point-Based Graphics

Title Neural Point-Based Graphics
Authors Kara-Ali Aliev, Dmitry Ulyanov, Victor Lempitsky
Abstract We present a new point-based approach for modeling complex scenes. The approach uses a raw point cloud as the geometric representation of a scene, and augments each point with a learnable neural descriptor that encodes local geometry and appearance. A deep rendering network is learned in parallel with the descriptors, so that new views of the scene can be obtained by passing the rasterizations of a point cloud from new viewpoints through this network. The input rasterizations use the learned descriptors as point pseudo-colors. We show that the proposed approach can be used for modeling complex scenes and obtaining their photorealistic views, while avoiding explicit surface estimation and meshing. In particular, compelling results are obtained for scene scanned using hand-held commodity RGB-D sensors as well as standard RGB cameras even in the presence of objects that are challenging for standard mesh-based modeling.
Tasks
Published 2019-06-19
URL https://arxiv.org/abs/1906.08240v2
PDF https://arxiv.org/pdf/1906.08240v2.pdf
PWC https://paperswithcode.com/paper/neural-point-based-graphics
Repo https://github.com/duburlan/npbg_eval
Framework none

A Neural Network for Detailed Human Depth Estimation from a Single Image

Title A Neural Network for Detailed Human Depth Estimation from a Single Image
Authors Sicong Tang, Feitong Tan, Kelvin Cheng, Zhaoyang Li, Siyu Zhu, Ping Tan
Abstract This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image. The result captures geometry details such as cloth wrinkles, which are important in visualization applications. To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively. We design a training strategy to ensure both base and detail shapes can be faithfully learned by the corresponding network branches. Furthermore, we introduce a novel network layer to fuse a rough depth map and surface normals to further improve the final result. Quantitative comparison with fused `ground truth’ captured by real depth cameras and qualitative examples on unconstrained Internet images demonstrate the strength of the proposed method. The code is available at https://github.com/sfu-gruvi-3dv/deep_human. |
Tasks Depth Estimation
Published 2019-10-03
URL https://arxiv.org/abs/1910.01275v2
PDF https://arxiv.org/pdf/1910.01275v2.pdf
PWC https://paperswithcode.com/paper/a-neural-network-for-detailed-human-depth
Repo https://github.com/sfu-gruvi-3dv/deep_human
Framework tf

Positional Normalization

Title Positional Normalization
Authors Boyi Li, Felix Wu, Kilian Q. Weinberger, Serge Belongie
Abstract A popular method to reduce the training time of deep neural networks is to normalize activations at each layer. Although various normalization schemes have been proposed, they all follow a common theme: normalize across spatial dimensions and discard the extracted statistics. In this paper, we propose an alternative normalization method that noticeably departs from this convention and normalizes exclusively across channels. We argue that the channel dimension is naturally appealing as it allows us to extract the first and second moments of features extracted at a particular image position. These moments capture structural information about the input image and extracted features, which opens a new avenue along which a network can benefit from feature normalization: Instead of disregarding the normalization constants, we propose to re-inject them into later layers to preserve or transfer structural information in generative networks. Codes are available at https://github.com/Boyiliee/PONO.
Tasks
Published 2019-07-09
URL https://arxiv.org/abs/1907.04312v2
PDF https://arxiv.org/pdf/1907.04312v2.pdf
PWC https://paperswithcode.com/paper/positional-normalization
Repo https://github.com/Boyiliee/PONO
Framework pytorch

Fully Unsupervised Probabilistic Noise2Void

Title Fully Unsupervised Probabilistic Noise2Void
Authors Mangal Prakash, Manan Lalit, Pavel Tomancak, Alexander Krull, Florian Jug
Abstract Image denoising is the first step in many biomedical image analysis pipelines and Deep Learning (DL) based methods are currently best performing. A new category of DL methods such as Noise2Void or Noise2Self can be used fully unsupervised, requiring nothing but the noisy data. However, this comes at the price of reduced reconstruction quality. The recently proposed Probabilistic Noise2Void (PN2V) improves results, but requires an additional noise model for which calibration data needs to be acquired. Here, we present improvements to PN2V that (i) replace histogram based noise models by parametric noise models, and (ii) show how suitable noise models can be created even in the absence of calibration data. This is a major step since it actually renders PN2V fully unsupervised. We demonstrate that all proposed improvements are not only academic but indeed relevant.
Tasks Calibration, Denoising, Image Denoising
Published 2019-11-27
URL https://arxiv.org/abs/1911.12291v2
PDF https://arxiv.org/pdf/1911.12291v2.pdf
PWC https://paperswithcode.com/paper/fully-unsupervised-probabilistic-noise2void
Repo https://github.com/juglab/ppn2v
Framework pytorch

Geometry-Aware Neural Rendering

Title Geometry-Aware Neural Rendering
Authors Josh Tobin, OpenAI Robotics, Pieter Abbeel
Abstract Understanding the 3-dimensional structure of the world is a core challenge in computer vision and robotics. Neural rendering approaches learn an implicit 3D model by predicting what a camera would see from an arbitrary viewpoint. We extend existing neural rendering to more complex, higher dimensional scenes than previously possible. We propose Epipolar Cross Attention (ECA), an attention mechanism that leverages the geometry of the scene to perform efficient non-local operations, requiring only $O(n)$ comparisons per spatial dimension instead of $O(n^2)$. We introduce three new simulated datasets inspired by real-world robotics and demonstrate that ECA significantly improves the quantitative and qualitative performance of Generative Query Networks (GQN).
Tasks
Published 2019-10-28
URL https://arxiv.org/abs/1911.04554v1
PDF https://arxiv.org/pdf/1911.04554v1.pdf
PWC https://paperswithcode.com/paper/191104554
Repo https://github.com/josh-tobin/egqn-datasets
Framework tf

Depth Estimation in Nighttime using Stereo-Consistent Cyclic Translations

Title Depth Estimation in Nighttime using Stereo-Consistent Cyclic Translations
Authors Aashish Sharma, Robby T. Tan, Loong-Fah Cheong
Abstract Most existing methods of depth from stereo are designed for daytime scenes, where the lighting can be assumed to be sufficiently bright and more or less uniform. Unfortunately, this assumption does not hold for nighttime scenes, causing the existing methods to be erroneous when deployed in nighttime. Nighttime is not only about low light, but also about glow, glare, non-uniform distribution of light, etc. One of the possible solutions is to train a network on nighttime images in a fully supervised manner. However, to obtain proper disparity ground-truths that are dense, independent from glare/glow, and can have sufficiently far depth ranges is extremely intractable. In this paper, to address the problem of depth from stereo in nighttime, we introduce a joint translation and stereo network that is robust to nighttime conditions. Our method uses no direct supervision and does not require ground-truth disparities of the nighttime training images. First, we utilize a translation network that can render realistic nighttime stereo images from given daytime stereo images. Second, we train a stereo network on the rendered nighttime images using the available disparity supervision from the daytime images, and simultaneously also train the translation network to gradually improve the rendered nighttime images. We introduce a stereo-consistency constraint into our translation network to ensure that the translated pairs are stereo-consistent. Our experiments show that our joint translation-stereo network outperforms the state-of-the-art methods.
Tasks Depth Estimation
Published 2019-09-30
URL https://arxiv.org/abs/1909.13701v1
PDF https://arxiv.org/pdf/1909.13701v1.pdf
PWC https://paperswithcode.com/paper/depth-estimation-in-nighttime-using-stereo
Repo https://github.com/aasharma90/CycleStereoGAN_NighttimeDepth
Framework pytorch

LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution

Title LCSCNet: Linear Compressing Based Skip-Connecting Network for Image Super-Resolution
Authors Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin Liao
Abstract In this paper, we develop a concise but efficient network architecture called linear compressing based skip-connecting network (LCSCNet) for image super-resolution. Compared with two representative network architectures with skip connections, ResNet and DenseNet, a linear compressing layer is designed in LCSCNet for skip connection, which connects former feature maps and distinguishes them from newly-explored feature maps. In this way, the proposed LCSCNet enjoys the merits of the distinguish feature treatment of DenseNet and the parameter-economic form of ResNet. Moreover, to better exploit hierarchical information from both low and high levels of various receptive fields in deep models, inspired by gate units in LSTM, we also propose an adaptive element-wise fusion strategy with multi-supervised training. Experimental results in comparison with state-of-the-art algorithms validate the effectiveness of LCSCNet.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-09-09
URL https://arxiv.org/abs/1909.03573v1
PDF https://arxiv.org/pdf/1909.03573v1.pdf
PWC https://paperswithcode.com/paper/lcscnet-linear-compressing-based-skip
Repo https://github.com/XuechenZhang123/LCSC
Framework pytorch
comments powered by Disqus