October 20, 2019

2917 words 14 mins read

Paper Group AWR 172

Paper Group AWR 172

CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation. Generating Multi-Categorical Samples with Generative Adversarial Networks. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. 3D human pose estimation in video with temporal convolutions and semi-supervised training. ReCoNet: Real-time Coherent Video S …

CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation

Title CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation
Authors Zhanwei Xu, Ziyi Wu, Jianjiang Feng
Abstract In this paper, we propose a novel heart segmentation pipeline Combining Faster R-CNN and U-net Network (CFUN). Due to Faster R-CNN’s precise localization ability and U-net’s powerful segmentation ability, CFUN needs only one-step detection and segmentation inference to get the whole heart segmentation result, obtaining good results with significantly reduced computational cost. Besides, CFUN adopts a new loss function based on edge information named 3D Edge-loss as an auxiliary loss to accelerate the convergence of training and improve the segmentation results. Extensive experiments on the public dataset show that CFUN exhibits competitive segmentation performance in a sharply reduced inference time. Our source code and the model are publicly available at https://github.com/Wuziyi616/CFUN.
Tasks
Published 2018-12-12
URL http://arxiv.org/abs/1812.04914v1
PDF http://arxiv.org/pdf/1812.04914v1.pdf
PWC https://paperswithcode.com/paper/cfun-combining-faster-r-cnn-and-u-net-network
Repo https://github.com/Wuziyi616/CFUN
Framework pytorch

Generating Multi-Categorical Samples with Generative Adversarial Networks

Title Generating Multi-Categorical Samples with Generative Adversarial Networks
Authors Ramiro Camino, Christian Hammerschmidt, Radu State
Abstract We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models.
Tasks
Published 2018-07-03
URL http://arxiv.org/abs/1807.01202v2
PDF http://arxiv.org/pdf/1807.01202v2.pdf
PWC https://paperswithcode.com/paper/generating-multi-categorical-samples-with
Repo https://github.com/rcamino/multi-categorical-gans
Framework pytorch

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images

Title Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
Authors Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang
Abstract We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.
Tasks 3D Object Reconstruction
Published 2018-04-05
URL http://arxiv.org/abs/1804.01654v2
PDF http://arxiv.org/pdf/1804.01654v2.pdf
PWC https://paperswithcode.com/paper/pixel2mesh-generating-3d-mesh-models-from
Repo https://github.com/nywang16/Pixel2Mesh
Framework tf

3D human pose estimation in video with temporal convolutions and semi-supervised training

Title 3D human pose estimation in video with temporal convolutions and semi-supervised training
Authors Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli
Abstract In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semi-supervised settings where labeled data is scarce. Code and models are available at https://github.com/facebookresearch/VideoPose3D
Tasks 3D Human Pose Estimation, Pose Estimation
Published 2018-11-28
URL http://arxiv.org/abs/1811.11742v2
PDF http://arxiv.org/pdf/1811.11742v2.pdf
PWC https://paperswithcode.com/paper/3d-human-pose-estimation-in-video-with
Repo https://github.com/garyzhao/SemGCN
Framework pytorch

ReCoNet: Real-time Coherent Video Style Transfer Network

Title ReCoNet: Real-time Coherent Video Style Transfer Network
Authors Chang Gao, Derun Gu, Fangjun Zhang, Yizhou Yu
Abstract Image style transfer models based on convolutional neural networks usually suffer from high temporal inconsistency when applied to videos. Some video style transfer models have been proposed to improve temporal consistency, yet they fail to guarantee fast processing speed, nice perceptual style quality and high temporal consistency at the same time. In this paper, we propose a novel real-time video style transfer model, ReCoNet, which can generate temporally coherent style transfer videos while maintaining favorable perceptual styles. A novel luminance warping constraint is added to the temporal loss at the output level to capture luminance changes between consecutive frames and increase stylization stability under illumination effects. We also propose a novel feature-map-level temporal loss to further enhance temporal consistency on traceable objects. Experimental results indicate that our model exhibits outstanding performance both qualitatively and quantitatively.
Tasks Style Transfer, Video Style Transfer
Published 2018-07-03
URL http://arxiv.org/abs/1807.01197v2
PDF http://arxiv.org/pdf/1807.01197v2.pdf
PWC https://paperswithcode.com/paper/reconet-real-time-coherent-video-style
Repo https://github.com/irsisyphus/reconet
Framework pytorch

Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network

Title Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network
Authors Filippos Kokkinos, Stamatios Lefkimmiatis
Abstract Modern digital cameras rely on the sequential execution of separate image processing steps to produce realistic images. The first two steps are usually related to denoising and demosaicking where the former aims to reduce noise from the sensor and the latter converts a series of light intensity readings to color images. Modern approaches try to jointly solve these problems, i.e. joint denoising-demosaicking which is an inherently ill-posed problem given that two-thirds of the intensity information is missing and the rest are perturbed by noise. While there are several machine learning systems that have been recently introduced to solve this problem, the majority of them relies on generic network architectures which do not explicitly take into account the physical image model. In this work we propose a novel algorithm which is inspired by powerful classical image regularization methods, large-scale optimization, and deep learning techniques. Consequently, our derived iterative optimization algorithm, which involves a trainable denoising network, has a transparent and clear interpretation compared to other black-box data driven approaches. Our extensive experimentation line demonstrates that our proposed method outperforms any previous approaches for both noisy and noise-free data across many different datasets. This improvement in reconstruction quality is attributed to the rigorous derivation of an iterative solution and the principled way we design our denoising network architecture, which as a result requires fewer trainable parameters than the current state-of-the-art solution and furthermore can be efficiently trained by using a significantly smaller number of training data than existing deep demosaicking networks. Code and results can be found at https://github.com/cig-skoltech/deep_demosaick
Tasks Demosaicking, Denoising
Published 2018-07-16
URL http://arxiv.org/abs/1807.06403v3
PDF http://arxiv.org/pdf/1807.06403v3.pdf
PWC https://paperswithcode.com/paper/iterative-residual-network-for-deep-joint
Repo https://github.com/cig-skoltech/deep_demosaick
Framework pytorch

Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI

Title Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI
Authors Sérgio Pereira, Victor Alves, Carlos A. Silva
Abstract Convolutional neural networks (CNNs) have been successfully used for brain tumor segmentation, specifically, fully convolutional networks (FCNs). FCNs can segment a set of voxels at once, having a direct spatial correspondence between units in feature maps (FMs) at a given location and the corresponding classified voxels. In convolutional layers, FMs are merged to create new FMs, so, channel combination is crucial. However, not all FMs have the same relevance for a given class. Recently, in classification problems, Squeeze-and-Excitation (SE) blocks have been proposed to re-calibrate FMs as a whole, and suppress the less informative ones. However, this is not optimal in FCN due to the spatial correspondence between units and voxels. In this article, we propose feature recombination through linear expansion and compression to create more complex features for semantic segmentation. Additionally, we propose a segmentation SE (SegSE) block for feature recalibration that collects contextual information, while maintaining the spatial meaning. Finally, we evaluate the proposed methods in brain tumor segmentation, using publicly available data.
Tasks Brain Tumor Segmentation, Semantic Segmentation
Published 2018-06-06
URL http://arxiv.org/abs/1806.02318v1
PDF http://arxiv.org/pdf/1806.02318v1.pdf
PWC https://paperswithcode.com/paper/adaptive-feature-recombination-and
Repo https://github.com/sergiormpereira/rr_segse
Framework none

Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment

Title Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment
Authors Inyong Yun, Cheolkon Jung, Xinran Wang, Alfred O Hero, Joongkyu Kim
Abstract Pedestrians in videos have a wide range of appearances such as body poses, occlusions, and complex backgrounds, and there exists the proposal shift problem in pedestrian detection that causes the loss of body parts such as head and legs. To address it, we propose part-level convolutional neural networks (CNN) for pedestrian detection using saliency and boundary box alignment in this paper. The proposed network consists of two sub-networks: detection and alignment. We use saliency in the detection sub-network to remove false positives such as lamp posts and trees. We adopt bounding box alignment on detection proposals in the alignment sub-network to address the proposal shift problem. First, we combine FCN and CAM to extract deep features for pedestrian detection. Then, we perform part-level CNN to recall the lost body parts. Experimental results on various datasets demonstrate that the proposed method remarkably improves accuracy in pedestrian detection and outperforms existing state-of-the-arts in terms of log average miss rate at false position per image (FPPI).
Tasks Pedestrian Detection
Published 2018-10-01
URL http://arxiv.org/abs/1810.00689v1
PDF http://arxiv.org/pdf/1810.00689v1.pdf
PWC https://paperswithcode.com/paper/part-level-convolutional-neural-networks-for
Repo https://github.com/iyyun/Part-CNN
Framework none

Evaluating the Utility of Hand-crafted Features in Sequence Labelling

Title Evaluating the Utility of Hand-crafted Features in Sequence Labelling
Authors Minghao Wu, Fei Liu, Trevor Cohn
Abstract Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a $F_1$ of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60%, while retaining the same predictive accuracy.
Tasks Named Entity Recognition
Published 2018-08-28
URL http://arxiv.org/abs/1808.09075v1
PDF http://arxiv.org/pdf/1808.09075v1.pdf
PWC https://paperswithcode.com/paper/evaluating-the-utility-of-hand-crafted
Repo https://github.com/minghao-wu/CRF-AE
Framework pytorch

Parsing Tweets into Universal Dependencies

Title Parsing Tweets into Universal Dependencies
Authors Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith
Abstract We study the problem of analyzing tweets with Universal Dependencies. We extend the UD guidelines to cover special constructions in tweets that affect tokenization, part-of-speech tagging, and labeled dependencies. Using the extended guidelines, we create a new tweet treebank for English (Tweebank v2) that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong et al. (2014). We characterize the disagreements between our annotators and show that it is challenging to deliver consistent annotation due to ambiguity in understanding and explaining tweets. Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD. To overcome annotation noise without sacrificing computational efficiency, we propose a new method to distill an ensemble of 20 transition-based parsers into a single one. Our parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and outperforms parsers that are state-of-the-art on other treebanks in both accuracy and speed.
Tasks Part-Of-Speech Tagging, Tokenization
Published 2018-04-23
URL http://arxiv.org/abs/1804.08228v1
PDF http://arxiv.org/pdf/1804.08228v1.pdf
PWC https://paperswithcode.com/paper/parsing-tweets-into-universal-dependencies
Repo https://github.com/Oneplus/Tweebank
Framework none

Universal Language Model Fine-Tuning with Subword Tokenization for Polish

Title Universal Language Model Fine-Tuning with Subword Tokenization for Polish
Authors Piotr Czapla, Jeremy Howard, Marcin Kardas
Abstract Universal Language Model for Fine-tuning [arXiv:1801.06146] (ULMFiT) is one of the first NLP methods for efficient inductive transfer learning. Unsupervised pretraining results in improvements on many NLP tasks for English. In this paper, we describe a new method that uses subword tokenization to adapt ULMFiT to languages with high inflection. Our approach results in a new state-of-the-art for the Polish language, taking first place in Task 3 of PolEval’18. After further training, our final model outperformed the second best model by 35%. We have open-sourced our pretrained models and code.
Tasks Language Modelling, Tokenization, Transfer Learning
Published 2018-10-24
URL http://arxiv.org/abs/1810.10222v1
PDF http://arxiv.org/pdf/1810.10222v1.pdf
PWC https://paperswithcode.com/paper/universal-language-model-fine-tuning-with
Repo https://github.com/n-waves/poleval2018
Framework none

Deep Neural Machine Translation with Weakly-Recurrent Units

Title Deep Neural Machine Translation with Weakly-Recurrent Units
Authors Mattia Antonino Di Gangi, Marcello Federico
Abstract Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation. Recently, new architectures have been proposed, which can leverage parallel computation on GPUs better than classical RNNs. Faster training and inference combined with different sequence-to-sequence modeling also lead to performance improvements. While the new models completely depart from the original recurrent architecture, we decided to investigate how to make RNNs more efficient. In this work, we propose a new recurrent NMT architecture, called Simple Recurrent NMT, built on a class of fast and weakly-recurrent units that use layer normalization and multiple attentions. Our experiments on the WMT14 English-to-German and WMT16 English-Romanian benchmarks show that our model represents a valid alternative to LSTMs, as it can achieve better results at a significantly lower computational cost.
Tasks Machine Translation
Published 2018-05-10
URL http://arxiv.org/abs/1805.04185v1
PDF http://arxiv.org/pdf/1805.04185v1.pdf
PWC https://paperswithcode.com/paper/deep-neural-machine-translation-with-weakly
Repo https://github.com/mattiadg/SR-NMT
Framework pytorch

Synthesizing Images of Humans in Unseen Poses

Title Synthesizing Images of Humans in Unseen Poses
Authors Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag
Abstract We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions.
Tasks Image Generation
Published 2018-04-20
URL http://arxiv.org/abs/1804.07739v1
PDF http://arxiv.org/pdf/1804.07739v1.pdf
PWC https://paperswithcode.com/paper/synthesizing-images-of-humans-in-unseen-poses
Repo https://github.com/balakg/posewarp-cvpr2018
Framework tf

Soft Sampling for Robust Object Detection

Title Soft Sampling for Robust Object Detection
Authors Zhe Wu, Navaneeth Bodla, Bharat Singh, Mahyar Najibi, Rama Chellappa, Larry S. Davis
Abstract We study the robustness of object detection under the presence of missing annotations. In this setting, the unlabeled object instances will be treated as background, which will generate an incorrect training signal for the detector. Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset. We provide a detailed explanation for this result. To further bridge the performance gap, we propose a simple yet effective solution, called Soft Sampling. Soft Sampling re-weights the gradients of RoIs as a function of overlap with positive instances. This ensures that the uncertain background regions are given a smaller weight compared to the hardnegatives. Extensive experiments on curated PASCAL VOC datasets demonstrate the effectiveness of the proposed Soft Sampling method at different annotation drop rates. Finally, we show that on OpenImagesV3, which is a real-world dataset with missing annotations, Soft Sampling outperforms standard detection baselines by over 3%.
Tasks Object Detection, Robust Object Detection
Published 2018-06-18
URL https://arxiv.org/abs/1806.06986v2
PDF https://arxiv.org/pdf/1806.06986v2.pdf
PWC https://paperswithcode.com/paper/soft-sampling-for-robust-object-detection
Repo https://github.com/starimpact/arm_SNIPER
Framework tf

Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels

Title Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels
Authors Pawel Korus, Nasir Memon
Abstract Forensic analysis of digital photo provenance relies on intrinsic traces left in the photograph at the time of its acquisition. Such analysis becomes unreliable after heavy post-processing, such as down-sampling and re-compression applied upon distribution in the Web. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel. We demonstrate that neural imaging pipelines can be trained to replace the internals of digital cameras, and jointly optimized for high-fidelity photo development and reliable provenance analysis. In our experiments, the proposed approach increased image manipulation detection accuracy from 45% to over 90%. The findings encourage further research towards building more reliable imaging pipelines with explicit provenance-guaranteeing properties.
Tasks Image Manipulation Detection
Published 2018-12-04
URL http://arxiv.org/abs/1812.01516v2
PDF http://arxiv.org/pdf/1812.01516v2.pdf
PWC https://paperswithcode.com/paper/content-authentication-for-neural-imaging
Repo https://github.com/pkorus/neural-imaging
Framework tf
comments powered by Disqus