Paper Group AWR 172
CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation. Generating Multi-Categorical Samples with Generative Adversarial Networks. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images. 3D human pose estimation in video with temporal convolutions and semi-supervised training. ReCoNet: Real-time Coherent Video S …
CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation
Title | CFUN: Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation |
Authors | Zhanwei Xu, Ziyi Wu, Jianjiang Feng |
Abstract | In this paper, we propose a novel heart segmentation pipeline Combining Faster R-CNN and U-net Network (CFUN). Due to Faster R-CNN’s precise localization ability and U-net’s powerful segmentation ability, CFUN needs only one-step detection and segmentation inference to get the whole heart segmentation result, obtaining good results with significantly reduced computational cost. Besides, CFUN adopts a new loss function based on edge information named 3D Edge-loss as an auxiliary loss to accelerate the convergence of training and improve the segmentation results. Extensive experiments on the public dataset show that CFUN exhibits competitive segmentation performance in a sharply reduced inference time. Our source code and the model are publicly available at https://github.com/Wuziyi616/CFUN. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04914v1 |
http://arxiv.org/pdf/1812.04914v1.pdf | |
PWC | https://paperswithcode.com/paper/cfun-combining-faster-r-cnn-and-u-net-network |
Repo | https://github.com/Wuziyi616/CFUN |
Framework | pytorch |
Generating Multi-Categorical Samples with Generative Adversarial Networks
Title | Generating Multi-Categorical Samples with Generative Adversarial Networks |
Authors | Ramiro Camino, Christian Hammerschmidt, Radu State |
Abstract | We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models. |
Tasks | |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01202v2 |
http://arxiv.org/pdf/1807.01202v2.pdf | |
PWC | https://paperswithcode.com/paper/generating-multi-categorical-samples-with |
Repo | https://github.com/rcamino/multi-categorical-gans |
Framework | pytorch |
Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images
Title | Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images |
Authors | Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang |
Abstract | We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image. Limited by the nature of deep neural network, previous methods usually represent a 3D shape in volume or point cloud, and it is non-trivial to convert them to the more ready-to-use mesh model. Unlike the existing methods, our network represents 3D mesh in a graph-based convolutional neural network and produces correct geometry by progressively deforming an ellipsoid, leveraging perceptual features extracted from the input image. We adopt a coarse-to-fine strategy to make the whole deformation procedure stable, and define various of mesh related losses to capture properties of different levels to guarantee visually appealing and physically accurate 3D geometry. Extensive experiments show that our method not only qualitatively produces mesh model with better details, but also achieves higher 3D shape estimation accuracy compared to the state-of-the-art. |
Tasks | 3D Object Reconstruction |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01654v2 |
http://arxiv.org/pdf/1804.01654v2.pdf | |
PWC | https://paperswithcode.com/paper/pixel2mesh-generating-3d-mesh-models-from |
Repo | https://github.com/nywang16/Pixel2Mesh |
Framework | tf |
3D human pose estimation in video with temporal convolutions and semi-supervised training
Title | 3D human pose estimation in video with temporal convolutions and semi-supervised training |
Authors | Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli |
Abstract | In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semi-supervised settings where labeled data is scarce. Code and models are available at https://github.com/facebookresearch/VideoPose3D |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11742v2 |
http://arxiv.org/pdf/1811.11742v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-human-pose-estimation-in-video-with |
Repo | https://github.com/garyzhao/SemGCN |
Framework | pytorch |
ReCoNet: Real-time Coherent Video Style Transfer Network
Title | ReCoNet: Real-time Coherent Video Style Transfer Network |
Authors | Chang Gao, Derun Gu, Fangjun Zhang, Yizhou Yu |
Abstract | Image style transfer models based on convolutional neural networks usually suffer from high temporal inconsistency when applied to videos. Some video style transfer models have been proposed to improve temporal consistency, yet they fail to guarantee fast processing speed, nice perceptual style quality and high temporal consistency at the same time. In this paper, we propose a novel real-time video style transfer model, ReCoNet, which can generate temporally coherent style transfer videos while maintaining favorable perceptual styles. A novel luminance warping constraint is added to the temporal loss at the output level to capture luminance changes between consecutive frames and increase stylization stability under illumination effects. We also propose a novel feature-map-level temporal loss to further enhance temporal consistency on traceable objects. Experimental results indicate that our model exhibits outstanding performance both qualitatively and quantitatively. |
Tasks | Style Transfer, Video Style Transfer |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01197v2 |
http://arxiv.org/pdf/1807.01197v2.pdf | |
PWC | https://paperswithcode.com/paper/reconet-real-time-coherent-video-style |
Repo | https://github.com/irsisyphus/reconet |
Framework | pytorch |
Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network
Title | Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network |
Authors | Filippos Kokkinos, Stamatios Lefkimmiatis |
Abstract | Modern digital cameras rely on the sequential execution of separate image processing steps to produce realistic images. The first two steps are usually related to denoising and demosaicking where the former aims to reduce noise from the sensor and the latter converts a series of light intensity readings to color images. Modern approaches try to jointly solve these problems, i.e. joint denoising-demosaicking which is an inherently ill-posed problem given that two-thirds of the intensity information is missing and the rest are perturbed by noise. While there are several machine learning systems that have been recently introduced to solve this problem, the majority of them relies on generic network architectures which do not explicitly take into account the physical image model. In this work we propose a novel algorithm which is inspired by powerful classical image regularization methods, large-scale optimization, and deep learning techniques. Consequently, our derived iterative optimization algorithm, which involves a trainable denoising network, has a transparent and clear interpretation compared to other black-box data driven approaches. Our extensive experimentation line demonstrates that our proposed method outperforms any previous approaches for both noisy and noise-free data across many different datasets. This improvement in reconstruction quality is attributed to the rigorous derivation of an iterative solution and the principled way we design our denoising network architecture, which as a result requires fewer trainable parameters than the current state-of-the-art solution and furthermore can be efficiently trained by using a significantly smaller number of training data than existing deep demosaicking networks. Code and results can be found at https://github.com/cig-skoltech/deep_demosaick |
Tasks | Demosaicking, Denoising |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.06403v3 |
http://arxiv.org/pdf/1807.06403v3.pdf | |
PWC | https://paperswithcode.com/paper/iterative-residual-network-for-deep-joint |
Repo | https://github.com/cig-skoltech/deep_demosaick |
Framework | pytorch |
Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI
Title | Adaptive feature recombination and recalibration for semantic segmentation: application to brain tumor segmentation in MRI |
Authors | Sérgio Pereira, Victor Alves, Carlos A. Silva |
Abstract | Convolutional neural networks (CNNs) have been successfully used for brain tumor segmentation, specifically, fully convolutional networks (FCNs). FCNs can segment a set of voxels at once, having a direct spatial correspondence between units in feature maps (FMs) at a given location and the corresponding classified voxels. In convolutional layers, FMs are merged to create new FMs, so, channel combination is crucial. However, not all FMs have the same relevance for a given class. Recently, in classification problems, Squeeze-and-Excitation (SE) blocks have been proposed to re-calibrate FMs as a whole, and suppress the less informative ones. However, this is not optimal in FCN due to the spatial correspondence between units and voxels. In this article, we propose feature recombination through linear expansion and compression to create more complex features for semantic segmentation. Additionally, we propose a segmentation SE (SegSE) block for feature recalibration that collects contextual information, while maintaining the spatial meaning. Finally, we evaluate the proposed methods in brain tumor segmentation, using publicly available data. |
Tasks | Brain Tumor Segmentation, Semantic Segmentation |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02318v1 |
http://arxiv.org/pdf/1806.02318v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-feature-recombination-and |
Repo | https://github.com/sergiormpereira/rr_segse |
Framework | none |
Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment
Title | Part-Level Convolutional Neural Networks for Pedestrian Detection Using Saliency and Boundary Box Alignment |
Authors | Inyong Yun, Cheolkon Jung, Xinran Wang, Alfred O Hero, Joongkyu Kim |
Abstract | Pedestrians in videos have a wide range of appearances such as body poses, occlusions, and complex backgrounds, and there exists the proposal shift problem in pedestrian detection that causes the loss of body parts such as head and legs. To address it, we propose part-level convolutional neural networks (CNN) for pedestrian detection using saliency and boundary box alignment in this paper. The proposed network consists of two sub-networks: detection and alignment. We use saliency in the detection sub-network to remove false positives such as lamp posts and trees. We adopt bounding box alignment on detection proposals in the alignment sub-network to address the proposal shift problem. First, we combine FCN and CAM to extract deep features for pedestrian detection. Then, we perform part-level CNN to recall the lost body parts. Experimental results on various datasets demonstrate that the proposed method remarkably improves accuracy in pedestrian detection and outperforms existing state-of-the-arts in terms of log average miss rate at false position per image (FPPI). |
Tasks | Pedestrian Detection |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00689v1 |
http://arxiv.org/pdf/1810.00689v1.pdf | |
PWC | https://paperswithcode.com/paper/part-level-convolutional-neural-networks-for |
Repo | https://github.com/iyyun/Part-CNN |
Framework | none |
Evaluating the Utility of Hand-crafted Features in Sequence Labelling
Title | Evaluating the Utility of Hand-crafted Features in Sequence Labelling |
Authors | Minghao Wu, Fei Liu, Trevor Cohn |
Abstract | Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a $F_1$ of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60%, while retaining the same predictive accuracy. |
Tasks | Named Entity Recognition |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09075v1 |
http://arxiv.org/pdf/1808.09075v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-the-utility-of-hand-crafted |
Repo | https://github.com/minghao-wu/CRF-AE |
Framework | pytorch |
Parsing Tweets into Universal Dependencies
Title | Parsing Tweets into Universal Dependencies |
Authors | Yijia Liu, Yi Zhu, Wanxiang Che, Bing Qin, Nathan Schneider, Noah A. Smith |
Abstract | We study the problem of analyzing tweets with Universal Dependencies. We extend the UD guidelines to cover special constructions in tweets that affect tokenization, part-of-speech tagging, and labeled dependencies. Using the extended guidelines, we create a new tweet treebank for English (Tweebank v2) that is four times larger than the (unlabeled) Tweebank v1 introduced by Kong et al. (2014). We characterize the disagreements between our annotators and show that it is challenging to deliver consistent annotation due to ambiguity in understanding and explaining tweets. Nonetheless, using the new treebank, we build a pipeline system to parse raw tweets into UD. To overcome annotation noise without sacrificing computational efficiency, we propose a new method to distill an ensemble of 20 transition-based parsers into a single one. Our parser achieves an improvement of 2.2 in LAS over the un-ensembled baseline and outperforms parsers that are state-of-the-art on other treebanks in both accuracy and speed. |
Tasks | Part-Of-Speech Tagging, Tokenization |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08228v1 |
http://arxiv.org/pdf/1804.08228v1.pdf | |
PWC | https://paperswithcode.com/paper/parsing-tweets-into-universal-dependencies |
Repo | https://github.com/Oneplus/Tweebank |
Framework | none |
Universal Language Model Fine-Tuning with Subword Tokenization for Polish
Title | Universal Language Model Fine-Tuning with Subword Tokenization for Polish |
Authors | Piotr Czapla, Jeremy Howard, Marcin Kardas |
Abstract | Universal Language Model for Fine-tuning [arXiv:1801.06146] (ULMFiT) is one of the first NLP methods for efficient inductive transfer learning. Unsupervised pretraining results in improvements on many NLP tasks for English. In this paper, we describe a new method that uses subword tokenization to adapt ULMFiT to languages with high inflection. Our approach results in a new state-of-the-art for the Polish language, taking first place in Task 3 of PolEval’18. After further training, our final model outperformed the second best model by 35%. We have open-sourced our pretrained models and code. |
Tasks | Language Modelling, Tokenization, Transfer Learning |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10222v1 |
http://arxiv.org/pdf/1810.10222v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-language-model-fine-tuning-with |
Repo | https://github.com/n-waves/poleval2018 |
Framework | none |
Deep Neural Machine Translation with Weakly-Recurrent Units
Title | Deep Neural Machine Translation with Weakly-Recurrent Units |
Authors | Mattia Antonino Di Gangi, Marcello Federico |
Abstract | Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation. Recently, new architectures have been proposed, which can leverage parallel computation on GPUs better than classical RNNs. Faster training and inference combined with different sequence-to-sequence modeling also lead to performance improvements. While the new models completely depart from the original recurrent architecture, we decided to investigate how to make RNNs more efficient. In this work, we propose a new recurrent NMT architecture, called Simple Recurrent NMT, built on a class of fast and weakly-recurrent units that use layer normalization and multiple attentions. Our experiments on the WMT14 English-to-German and WMT16 English-Romanian benchmarks show that our model represents a valid alternative to LSTMs, as it can achieve better results at a significantly lower computational cost. |
Tasks | Machine Translation |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.04185v1 |
http://arxiv.org/pdf/1805.04185v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-machine-translation-with-weakly |
Repo | https://github.com/mattiadg/SR-NMT |
Framework | pytorch |
Synthesizing Images of Humans in Unseen Poses
Title | Synthesizing Images of Humans in Unseen Poses |
Authors | Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag |
Abstract | We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a scene into different body part and background layers, moves body parts to new locations and refines their appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single target image as a supervised label. We use an adversarial discriminator to force our network to synthesize realistic details conditioned on pose. We demonstrate image synthesis results on three action classes: golf, yoga/workouts and tennis, and show that our method produces accurate results within action classes as well as across action classes. Given a sequence of desired poses, we also produce coherent videos of actions. |
Tasks | Image Generation |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07739v1 |
http://arxiv.org/pdf/1804.07739v1.pdf | |
PWC | https://paperswithcode.com/paper/synthesizing-images-of-humans-in-unseen-poses |
Repo | https://github.com/balakg/posewarp-cvpr2018 |
Framework | tf |
Soft Sampling for Robust Object Detection
Title | Soft Sampling for Robust Object Detection |
Authors | Zhe Wu, Navaneeth Bodla, Bharat Singh, Mahyar Najibi, Rama Chellappa, Larry S. Davis |
Abstract | We study the robustness of object detection under the presence of missing annotations. In this setting, the unlabeled object instances will be treated as background, which will generate an incorrect training signal for the detector. Interestingly, we observe that after dropping 30% of the annotations (and labeling them as background), the performance of CNN-based object detectors like Faster-RCNN only drops by 5% on the PASCAL VOC dataset. We provide a detailed explanation for this result. To further bridge the performance gap, we propose a simple yet effective solution, called Soft Sampling. Soft Sampling re-weights the gradients of RoIs as a function of overlap with positive instances. This ensures that the uncertain background regions are given a smaller weight compared to the hardnegatives. Extensive experiments on curated PASCAL VOC datasets demonstrate the effectiveness of the proposed Soft Sampling method at different annotation drop rates. Finally, we show that on OpenImagesV3, which is a real-world dataset with missing annotations, Soft Sampling outperforms standard detection baselines by over 3%. |
Tasks | Object Detection, Robust Object Detection |
Published | 2018-06-18 |
URL | https://arxiv.org/abs/1806.06986v2 |
https://arxiv.org/pdf/1806.06986v2.pdf | |
PWC | https://paperswithcode.com/paper/soft-sampling-for-robust-object-detection |
Repo | https://github.com/starimpact/arm_SNIPER |
Framework | tf |
Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels
Title | Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels |
Authors | Pawel Korus, Nasir Memon |
Abstract | Forensic analysis of digital photo provenance relies on intrinsic traces left in the photograph at the time of its acquisition. Such analysis becomes unreliable after heavy post-processing, such as down-sampling and re-compression applied upon distribution in the Web. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel. We demonstrate that neural imaging pipelines can be trained to replace the internals of digital cameras, and jointly optimized for high-fidelity photo development and reliable provenance analysis. In our experiments, the proposed approach increased image manipulation detection accuracy from 45% to over 90%. The findings encourage further research towards building more reliable imaging pipelines with explicit provenance-guaranteeing properties. |
Tasks | Image Manipulation Detection |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01516v2 |
http://arxiv.org/pdf/1812.01516v2.pdf | |
PWC | https://paperswithcode.com/paper/content-authentication-for-neural-imaging |
Repo | https://github.com/pkorus/neural-imaging |
Framework | tf |