October 20, 2019

2917 words 14 mins read

Paper Group AWR 172

CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation. Generating Multi-Categorical Samples with Generative Adversarial Networks. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. 3D human pose estimation in video with temporal convolutions and semi-supervised training. ReCoNet: Real-time Coherent Video S …

CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation


Title	CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation
Authors	Zhanwei Xu, Ziyi Wu, Jianjiang Feng
Abstract	In this paper, we propose a novel heart segmentation pipeline Combining Faster R-CNN and U-net Network (CFUN). Due to Faster R-CNN’s precise localization ability and U-net’s powerful segmentation ability, CFUN needs only one-step detection and segmentation inference to get the whole heart segmentation result, obtaining good results with significantly reduced computational cost. Besides, CFUN adopts a new loss function based on edge information named 3D Edge-loss as an auxiliary loss to accelerate the convergence of training and improve the segmentation results. Extensive experiments on the public dataset show that CFUN exhibits competitive segmentation performance in a sharply reduced inference time. Our source code and the model are publicly available at https://github.com/Wuziyi616/CFUN.
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.04914v1
PDF	http://arxiv.org/pdf/1812.04914v1.pdf
PWC	https://paperswithcode.com/paper/cfun-combining-faster-r-cnn-and-u-net-network
Repo	https://github.com/Wuziyi616/CFUN
Framework	pytorch

Generating Multi-Categorical Samples with Generative Adversarial Networks


Title	Generating Multi-Categorical Samples with Generative Adversarial Networks
Authors	Ramiro Camino, Christian Hammerschmidt, Radu State
Abstract	We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models.
Tasks
Published	2018-07-03
URL	http://arxiv.org/abs/1807.01202v2
PDF	http://arxiv.org/pdf/1807.01202v2.pdf
PWC	https://paperswithcode.com/paper/generating-multi-categorical-samples-with
Repo	https://github.com/rcamino/multi-categorical-gans
Framework	pytorch

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images


Title	Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
Authors	Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang
Abstract	We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art.
Tasks	3D Object Reconstruction
Published	2018-04-05
URL	http://arxiv.org/abs/1804.01654v2
PDF	http://arxiv.org/pdf/1804.01654v2.pdf
PWC	https://paperswithcode.com/paper/pixel2mesh-generating-3d-mesh-models-from
Repo	https://github.com/nywang16/Pixel2Mesh
Framework	tf

3D human pose estimation in video with temporal convolutions and semi-supervised training


Title	3D human pose estimation in video with temporal convolutions and semi-supervised training
Authors	Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli
Abstract	In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semi-supervised settings where labeled data is scarce. Code and models are available at https://github.com/facebookresearch/VideoPose3D
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11742v2
PDF	http://arxiv.org/pdf/1811.11742v2.pdf
PWC	https://paperswithcode.com/paper/3d-human-pose-estimation-in-video-with
Repo	https://github.com/garyzhao/SemGCN
Framework	pytorch

ReCoNet: Real-time Coherent Video Style Transfer Network


Title	ReCoNet: Real-time Coherent Video Style Transfer Network
Authors	Chang Gao, Derun Gu, Fangjun Zhang, Yizhou Yu
Abstract	Image style transfer models based on convolutional neural networks usually suffer from high temporal inconsistency when applied to videos. Some video style transfer models have been proposed to improve temporal consistency, yet they fail to guarantee fast processing speed, nice perceptual style quality and high temporal consistency at the same time. In this paper, we propose a novel real-time video style transfer model, ReCoNet, which can generate temporally coherent style transfer videos while maintaining favorable perceptual styles. A novel luminance warping constraint is added to the temporal loss at the output level to capture luminance changes between consecutive frames and increase stylization stability under illumination effects. We also propose a novel feature-map-level temporal loss to further enhance temporal consistency on traceable objects. Experimental results indicate that our model exhibits outstanding performance both qualitatively and quantitatively.
Tasks	Style Transfer, Video Style Transfer
Published	2018-07-03
URL	http://arxiv.org/abs/1807.01197v2
PDF	http://arxiv.org/pdf/1807.01197v2.pdf
PWC	https://paperswithcode.com/paper/reconet-real-time-coherent-video-style
Repo	https://github.com/irsisyphus/reconet
Framework	pytorch

Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network


Title	Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network
Authors	Filippos Kokkinos, Stamatios Lefkimmiatis
Abstract	Modern digital cameras rely on the sequential execution of separate image processing steps to produce realistic images. The first two steps are usually related to denoising and demosaicking where the former aims to reduce noise from the sensor and the latter converts a series of light intensity readings to color images. Modern approaches try to jointly solve these problems, i.e. joint denoising-demosaicking which is an inherently ill-posed problem given that two-thirds of the intensity information is missing and the rest are perturbed by noise. While there are several machine learning systems that have been recently introduced to solve this problem, the majority of them relies on generic network architectures which do not explicitly take into account the physical image model. In this work we propose a novel algorithm which is inspired by powerful classical image regularization methods, large-scale optimization, and deep learning techniques. Consequently, our derived iterative optimization algorithm, which involves a trainable denoising network, has a transparent and clear interpretation compared to other black-box data driven approaches. Our extensive experimentation line demonstrates that our proposed method outperforms any previous approaches for both noisy and noise-free data across many different datasets. This improvement in reconstruction quality is attributed to the rigorous derivation of an iterative solution and the principled way we design our denoising network architecture, which as a result requires fewer trainable parameters than the current state-of-the-art solution and furthermore can be efficiently trained by using a significantly smaller number of training data than existing deep demosaicking networks. Code and results can be found at https://github.com/cig-skoltech/deep_demosaick
Tasks	Demosaicking, Denoising
Published	2018-07-16
URL	http://arxiv.org/abs/1807.06403v3
PDF	http://arxiv.org/pdf/1807.06403v3.pdf
PWC	https://paperswithcode.com/paper/iterative-residual-network-for-deep-joint
Repo	https://github.com/cig-skoltech/deep_demosaick
Framework	pytorch

Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI


Title	Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI
Authors	Sérgio Pereira, Victor Alves, Carlos A. Silva
Abstract	Convolutional neural networks (CNNs) have been successfully used for brain tumor segmentation, specifically, fully convolutional networks (FCNs). FCNs can segment a set of voxels at once, having a direct spatial correspondence between units in feature maps (FMs) at a given location and the corresponding classified voxels. In convolutional layers, FMs are merged to create new FMs, so, channel combination is crucial. However, not all FMs have the same relevance for a given class. Recently, in classification problems, Squeeze-and-Excitation (SE) blocks have been proposed to re-calibrate FMs as a whole, and suppress the less informative ones. However, this is not optimal in FCN due to the spatial correspondence between units and voxels. In this article, we propose feature recombination through linear expansion and compression to create more complex features for semantic segmentation. Additionally, we propose a segmentation SE (SegSE) block for feature recalibration that collects contextual information, while maintaining the spatial meaning. Finally, we evaluate the proposed methods in brain tumor segmentation, using publicly available data.
Tasks	Brain Tumor Segmentation, Semantic Segmentation
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02318v1
PDF	http://arxiv.org/pdf/1806.02318v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-feature-recombination-and
Repo	https://github.com/sergiormpereira/rr_segse
Framework	none

Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment


Title	Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment
Authors	Inyong Yun, Cheolkon Jung, Xinran Wang, Alfred O Hero, Joongkyu Kim
Abstract	Pedestrians in videos have a wide range of appearances such as body poses, occlusions, and complex backgrounds, and there exists the proposal shift problem in pedestrian detection that causes the loss of body parts such as head and legs. To address it, we propose part-level convolutional neural networks (CNN) for pedestrian detection using saliency and boundary box alignment in this paper. The proposed network consists of two sub-networks: detection and alignment. We use saliency in the detection sub-network to remove false positives such as lamp posts and trees. We adopt bounding box alignment on detection proposals in the alignment sub-network to address the proposal shift problem. First, we combine FCN and CAM to extract deep features for pedestrian detection. Then, we perform part-level CNN to recall the lost body parts. Experimental results on various datasets demonstrate that the proposed method remarkably improves accuracy in pedestrian detection and outperforms existing state-of-the-arts in terms of log average miss rate at false position per image (FPPI).
Tasks	Pedestrian Detection
Published	2018-10-01
URL	http://arxiv.org/abs/1810.00689v1
PDF	http://arxiv.org/pdf/1810.00689v1.pdf
PWC	https://paperswithcode.com/paper/part-level-convolutional-neural-networks-for
Repo	https://github.com/iyyun/Part-CNN
Framework	none

Evaluating the Utility of Hand-crafted Features in Sequence Labelling


Title	Evaluating the Utility of Hand-crafted Features in Sequence Labelling
Authors	Minghao Wu, Fei Liu, Trevor Cohn
Abstract	Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a $F_1$ of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60%, while retaining the same predictive accuracy.
Tasks	Named Entity Recognition
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09075v1
PDF	http://arxiv.org/pdf/1808.09075v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-the-utility-of-hand-crafted
Repo	https://github.com/minghao-wu/CRF-AE
Framework	pytorch

Parsing Tweets into Universal Dependencies


Title	Parsing Tweets into Universal Dependencies
Authors	Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith
Abstract	We study the problem of analyzing tweets with Universal Dependencies. We extend the UD guidelines to cover special constructions in tweets that affect tokenization, part-of-speech tagging, and labeled dependencies. Using the extended guidelines, we create a new tweet treebank for English (Tweebank v2) that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong et al. (2014). We characterize the disagreements between our annotators and show that it is challenging to deliver consistent annotation due to ambiguity in understanding and explaining tweets. Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD. To overcome annotation noise without sacrificing computational efficiency, we propose a new method to distill an ensemble of 20 transition-based parsers into a single one. Our parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and outperforms parsers that are state-of-the-art on other treebanks in both accuracy and speed.
Tasks	Part-Of-Speech Tagging, Tokenization
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08228v1
PDF	http://arxiv.org/pdf/1804.08228v1.pdf
PWC	https://paperswithcode.com/paper/parsing-tweets-into-universal-dependencies
Repo	https://github.com/Oneplus/Tweebank
Framework	none

Universal Language Model Fine-Tuning with Subword Tokenization for Polish


Title	Universal Language Model Fine-Tuning with Subword Tokenization for Polish
Authors	Piotr Czapla, Jeremy Howard, Marcin Kardas
Abstract	Universal Language Model for Fine-tuning [arXiv:1801.06146] (ULMFiT) is one of the first NLP methods for efficient inductive transfer learning. Unsupervised pretraining results in improvements on many NLP tasks for English. In this paper, we describe a new method that uses subword tokenization to adapt ULMFiT to languages with high inflection. Our approach results in a new state-of-the-art for the Polish language, taking first place in Task 3 of PolEval’18. After further training, our final model outperformed the second best model by 35%. We have open-sourced our pretrained models and code.
Tasks	Language Modelling, Tokenization, Transfer Learning
Published	2018-10-24
URL	http://arxiv.org/abs/1810.10222v1
PDF	http://arxiv.org/pdf/1810.10222v1.pdf
PWC	https://paperswithcode.com/paper/universal-language-model-fine-tuning-with
Repo	https://github.com/n-waves/poleval2018
Framework	none

Deep Neural Machine Translation with Weakly-Recurrent Units


Title	Deep Neural Machine Translation with Weakly-Recurrent Units
Authors	Mattia Antonino Di Gangi, Marcello Federico
Abstract	Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation. Recently, new architectures have been proposed, which can leverage parallel computation on GPUs better than classical RNNs. Faster training and inference combined with different sequence-to-sequence modeling also lead to performance improvements. While the new models completely depart from the original recurrent architecture, we decided to investigate how to make RNNs more efficient. In this work, we propose a new recurrent NMT architecture, called Simple Recurrent NMT, built on a class of fast and weakly-recurrent units that use layer normalization and multiple attentions. Our experiments on the WMT14 English-to-German and WMT16 English-Romanian benchmarks show that our model represents a valid alternative to LSTMs, as it can achieve better results at a significantly lower computational cost.
Tasks	Machine Translation
Published	2018-05-10
URL	http://arxiv.org/abs/1805.04185v1
PDF	http://arxiv.org/pdf/1805.04185v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-machine-translation-with-weakly
Repo	https://github.com/mattiadg/SR-NMT
Framework	pytorch

Synthesizing Images of Humans in Unseen Poses


Title	Synthesizing Images of Humans in Unseen Poses
Authors	Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag
Abstract	We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions.
Tasks	Image Generation
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07739v1
PDF	http://arxiv.org/pdf/1804.07739v1.pdf
PWC	https://paperswithcode.com/paper/synthesizing-images-of-humans-in-unseen-poses
Repo	https://github.com/balakg/posewarp-cvpr2018
Framework	tf

Soft Sampling for Robust Object Detection


Title	Soft Sampling for Robust Object Detection
Authors	Zhe Wu, Navaneeth Bodla, Bharat Singh, Mahyar Najibi, Rama Chellappa, Larry S. Davis
Abstract	We study the robustness of object detection under the presence of missing annotations. In this setting, the unlabeled object instances will be treated as background, which will generate an incorrect training signal for the detector. Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset. We provide a detailed explanation for this result. To further bridge the performance gap, we propose a simple yet effective solution, called Soft Sampling. Soft Sampling re-weights the gradients of RoIs as a function of overlap with positive instances. This ensures that the uncertain background regions are given a smaller weight compared to the hardnegatives. Extensive experiments on curated PASCAL VOC datasets demonstrate the effectiveness of the proposed Soft Sampling method at different annotation drop rates. Finally, we show that on OpenImagesV3, which is a real-world dataset with missing annotations, Soft Sampling outperforms standard detection baselines by over 3%.
Tasks	Object Detection, Robust Object Detection
Published	2018-06-18
URL	https://arxiv.org/abs/1806.06986v2
PDF	https://arxiv.org/pdf/1806.06986v2.pdf
PWC	https://paperswithcode.com/paper/soft-sampling-for-robust-object-detection
Repo	https://github.com/starimpact/arm_SNIPER
Framework	tf

Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels


Title	Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels
Authors	Pawel Korus, Nasir Memon
Abstract	Forensic analysis of digital photo provenance relies on intrinsic traces left in the photograph at the time of its acquisition. Such analysis becomes unreliable after heavy post-processing, such as down-sampling and re-compression applied upon distribution in the Web. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel. We demonstrate that neural imaging pipelines can be trained to replace the internals of digital cameras, and jointly optimized for high-fidelity photo development and reliable provenance analysis. In our experiments, the proposed approach increased image manipulation detection accuracy from 45% to over 90%. The findings encourage further research towards building more reliable imaging pipelines with explicit provenance-guaranteeing properties.
Tasks	Image Manipulation Detection
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01516v2
PDF	http://arxiv.org/pdf/1812.01516v2.pdf
PWC	https://paperswithcode.com/paper/content-authentication-for-neural-imaging
Repo	https://github.com/pkorus/neural-imaging
Framework	tf