July 29, 2019

3037 words 15 mins read

Paper Group AWR 192

Paper Group AWR 192

Robust Registration of Gaussian Mixtures for Colour Transfer. Transfer Learning for Speech Recognition on a Budget. Get To The Point: Summarization with Pointer-Generator Networks. Eye In-Painting with Exemplar Generative Adversarial Networks. Progressive Learning for Systematic Design of Large Neural Networks. Learning to Find Good Correspondences …

Robust Registration of Gaussian Mixtures for Colour Transfer

Title Robust Registration of Gaussian Mixtures for Colour Transfer
Authors Mairéad Grogan, Rozenn Dahyot
Abstract We present a flexible approach to colour transfer inspired by techniques recently proposed for shape registration. Colour distributions of the palette and target images are modelled with Gaussian Mixture Models (GMMs) that are robustly registered to infer a non linear parametric transfer function. We show experimentally that our approach compares well to current techniques both quantitatively and qualitatively. Moreover, our technique is computationally the fastest and can take efficient advantage of parallel processing architectures for recolouring images and videos. Our transfer function is parametric and hence can be stored in memory for later usage and also combined with other computed transfer functions to create interesting visual effects. Overall this paper provides a fast user friendly approach to recolouring of image and video materials.
Tasks
Published 2017-05-17
URL http://arxiv.org/abs/1705.06091v1
PDF http://arxiv.org/pdf/1705.06091v1.pdf
PWC https://paperswithcode.com/paper/robust-registration-of-gaussian-mixtures-for
Repo https://github.com/V-Sense/LFToolbox_Recolouring_HPR
Framework none

Transfer Learning for Speech Recognition on a Budget

Title Transfer Learning for Speech Recognition on a Budget
Authors Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober
Abstract End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network’s weights were sufficient for good performance, especially for inner layers.
Tasks Speech Recognition, Transfer Learning
Published 2017-06-01
URL http://arxiv.org/abs/1706.00290v1
PDF http://arxiv.org/pdf/1706.00290v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-speech-recognition-on-a
Repo https://github.com/transfer-learning-asr/transfer-learning-asr
Framework tf

Get To The Point: Summarization with Pointer-Generator Networks

Title Get To The Point: Summarization with Pointer-Generator Networks
Authors Abigail See, Peter J. Liu, Christopher D. Manning
Abstract Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. In this work we propose a novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways. First, we use a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Second, we use coverage to keep track of what has been summarized, which discourages repetition. We apply our model to the CNN / Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 2 ROUGE points.
Tasks Abstractive Text Summarization, Text Summarization
Published 2017-04-14
URL http://arxiv.org/abs/1704.04368v2
PDF http://arxiv.org/pdf/1704.04368v2.pdf
PWC https://paperswithcode.com/paper/get-to-the-point-summarization-with-pointer
Repo https://github.com/sblayush/summarization
Framework tf

Eye In-Painting with Exemplar Generative Adversarial Networks

Title Eye In-Painting with Exemplar Generative Adversarial Networks
Authors Brian Dolhansky, Cristian Canton Ferrer
Abstract This paper introduces a novel approach to in-painting where the identity of the object to remove or change is preserved and accounted for at inference time: Exemplar GANs (ExGANs). ExGANs are a type of conditional GAN that utilize exemplar information to produce high-quality, personalized in painting results. We propose using exemplar information in the form of a reference image of the region to in-paint, or a perceptual code describing that object. Unlike previous conditional GAN formulations, this extra information can be inserted at multiple points within the adversarial network, thus increasing its descriptive power. We show that ExGANs can produce photo-realistic personalized in-painting results that are both perceptually and semantically plausible by applying them to the task of closed to-open eye in-painting in natural pictures. A new benchmark dataset is also introduced for the task of eye in-painting for future comparisons.
Tasks
Published 2017-12-11
URL http://arxiv.org/abs/1712.03999v1
PDF http://arxiv.org/pdf/1712.03999v1.pdf
PWC https://paperswithcode.com/paper/eye-in-painting-with-exemplar-generative
Repo https://github.com/bdol/exemplar_gans
Framework none

Progressive Learning for Systematic Design of Large Neural Networks

Title Progressive Learning for Systematic Design of Large Neural Networks
Authors Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Partha P. Mitra, Mikael Skoglund
Abstract We develop an algorithm for systematic design of a large artificial neural network using a progression property. We find that some non-linear functions, such as the rectifier linear unit and its derivatives, hold the property. The systematic design addresses the choice of network size and regularization of parameters. The number of nodes and layers in network increases in progression with the objective of consistently reducing an appropriate cost. Each layer is optimized at a time, where appropriate parameters are learned using convex optimization. Regularization parameters for convex optimization do not need a significant manual effort for tuning. We also use random instances for some weight matrices, and that helps to reduce the number of parameters we learn. The developed network is expected to show good generalization power due to appropriate regularization and use of random weights in the layers. This expectation is verified by extensive experiments for classification and regression problems, using standard databases.
Tasks
Published 2017-10-23
URL http://arxiv.org/abs/1710.08177v1
PDF http://arxiv.org/pdf/1710.08177v1.pdf
PWC https://paperswithcode.com/paper/progressive-learning-for-systematic-design-of
Repo https://github.com/viebboy/POPmem
Framework none

Learning to Find Good Correspondences

Title Learning to Find Good Correspondences
Authors Kwang Moo Yi, Eduard Trulls, Yuki Ono, Vincent Lepetit, Mathieu Salzmann, Pascal Fua
Abstract We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while imbuing it with global information, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data.
Tasks
Published 2017-11-16
URL http://arxiv.org/abs/1711.05971v2
PDF http://arxiv.org/pdf/1711.05971v2.pdf
PWC https://paperswithcode.com/paper/learning-to-find-good-correspondences
Repo https://github.com/vcg-uvic/image-matching-benchmark
Framework none

Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme

Title Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Authors Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, Bo Xu
Abstract Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What’s more, the end-to-end model proposed in this paper, achieves the best results on the public dataset.
Tasks Joint Entity and Relation Extraction, Relation Extraction
Published 2017-06-07
URL http://arxiv.org/abs/1706.05075v1
PDF http://arxiv.org/pdf/1706.05075v1.pdf
PWC https://paperswithcode.com/paper/joint-extraction-of-entities-and-relations
Repo https://github.com/kyzhouhzau/CCLNER
Framework tf

A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction

Title A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction
Authors Hui Zeng, Lei Zhang, Alan C. Bovik
Abstract Blind image quality assessment (BIQA) remains a very challenging problem due to the unavailability of a reference image. Deep learning based BIQA methods have been attracting increasing attention in recent years, yet it remains a difficult task to train a robust deep BIQA model because of the very limited number of training samples with human subjective scores. Most existing methods learn a regression network to minimize the prediction error of a scalar image quality score. However, such a scheme ignores the fact that an image will receive divergent subjective scores from different subjects, which cannot be adequately represented by a single scalar number. This is particularly true on complex, real-world distorted images. Moreover, images may broadly differ in their distributions of assigned subjective scores. Recognizing this, we propose a new representation of perceptual image quality, called probabilistic quality representation (PQR), to describe the image subjective score distribution, whereby a more robust loss function can be employed to train a deep BIQA model. The proposed PQR method is shown to not only speed up the convergence of deep model training, but to also greatly improve the achievable level of quality prediction accuracy relative to scalar quality score regression methods. The source code is available at https://github.com/HuiZeng/BIQA_Toolbox.
Tasks Blind Image Quality Assessment, Image Quality Assessment
Published 2017-08-28
URL http://arxiv.org/abs/1708.08190v2
PDF http://arxiv.org/pdf/1708.08190v2.pdf
PWC https://paperswithcode.com/paper/a-probabilistic-quality-representation
Repo https://github.com/HuiZeng/BIQA_Toolbox
Framework none

Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation

Title Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation
Authors Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang
Abstract Hand pose estimation from a single depth image is an essential topic in computer vision and human computer interaction. Despite recent advancements in this area promoted by convolutional neural network, accurate hand pose estimation is still a challenging problem. In this paper we propose a Pose guided structured Region Ensemble Network (Pose-REN) to boost the performance of hand pose estimation. The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation. The extracted feature regions are then integrated hierarchically according to the topology of hand joints by employing tree-structured fully connections. A refined estimation of hand pose is directly regressed by the proposed network and the final hand pose is obtained by utilizing an iterative cascaded method. Comprehensive experiments on public hand pose datasets demonstrate that our proposed method outperforms state-of-the-art algorithms.
Tasks Hand Pose Estimation, Pose Estimation
Published 2017-08-11
URL http://arxiv.org/abs/1708.03416v2
PDF http://arxiv.org/pdf/1708.03416v2.pdf
PWC https://paperswithcode.com/paper/pose-guided-structured-region-ensemble
Repo https://github.com/xinghaochen/Pose-REN
Framework caffe2

Sparse Representation-based Open Set Recognition

Title Sparse Representation-based Open Set Recognition
Authors He Zhang, Vishal M. Patel
Abstract We propose a generalized Sparse Representation- based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstruction error distributions, we model the tail of those two error distributions using the statistical Extreme Value Theory (EVT). Then we simplify the open set recognition problem into a set of hypothesis testing problems. The confidence scores corresponding to the tail distributions of a novel test sample are then fused to determine its identity. The effectiveness of the proposed method is demonstrated using four publicly available image and object classification datasets and it is shown that this method can perform significantly better than many competitive open set recognition algorithms. Code is public available: https://github.com/hezhangsprinter/SROSR
Tasks Object Classification, Open Set Learning, Sparse Representation-based Classification
Published 2017-05-06
URL http://arxiv.org/abs/1705.02431v1
PDF http://arxiv.org/pdf/1705.02431v1.pdf
PWC https://paperswithcode.com/paper/sparse-representation-based-open-set
Repo https://github.com/hezhangsprinter/SROSR
Framework none

Improved Regularization of Convolutional Neural Networks with Cutout

Title Improved Regularization of Convolutional Neural Networks with Cutout
Authors Terrance DeVries, Graham W. Taylor
Abstract Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results of 2.56%, 15.20%, and 1.30% test error respectively. Code is available at https://github.com/uoguelph-mlrg/Cutout
Tasks Data Augmentation, Image Augmentation, Image Classification, Semi-Supervised Image Classification
Published 2017-08-15
URL http://arxiv.org/abs/1708.04552v2
PDF http://arxiv.org/pdf/1708.04552v2.pdf
PWC https://paperswithcode.com/paper/improved-regularization-of-convolutional
Repo https://github.com/uoguelph-mlrg/Cutout
Framework pytorch

Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence

Title Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence
Authors Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin
Abstract Knowledge graphs (KGs), which could provide essential relational information between entities, have been widely utilized in various knowledge-driven applications. Since the overall human knowledge is innumerable that still grows explosively and changes frequently, knowledge construction and update inevitably involve automatic mechanisms with less human supervision, which usually bring in plenty of noises and conflicts to KGs. However, most conventional knowledge representation learning methods assume that all triple facts in existing KGs share the same significance without any noises. To address this problem, we propose a novel confidence-aware knowledge representation learning framework (CKRL), which detects possible noises in KGs while learning knowledge representations with confidence simultaneously. Specifically, we introduce the triple confidence to conventional translation-based methods for knowledge representation learning. To make triple confidence more flexible and universal, we only utilize the internal structural information in KGs, and propose three kinds of triple confidences considering both local and global structural information. In experiments, We evaluate our models on knowledge graph noise detection, knowledge graph completion and triple classification. Experimental results demonstrate that our confidence-aware models achieve significant and consistent improvements on all tasks, which confirms the capability of CKRL modeling confidence with structural information in both KG noise detection and knowledge representation learning.
Tasks Knowledge Graph Completion, Knowledge Graphs, Representation Learning
Published 2017-05-09
URL http://arxiv.org/abs/1705.03202v2
PDF http://arxiv.org/pdf/1705.03202v2.pdf
PWC https://paperswithcode.com/paper/does-william-shakespeare-really-write-hamlet
Repo https://github.com/thunlp/CKRL
Framework none

Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs

Title Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
Authors Martin Simonovsky, Nikos Komodakis
Abstract A number of problems can be formulated as prediction on graph-structured data. In this work, we generalize the convolution operator from regular grids to arbitrary graphs while avoiding the spectral domain, which allows us to handle graphs of varying size and connectivity. To move beyond a simple diffusion, filter weights are conditioned on the specific edge labels in the neighborhood of a vertex. Together with the proper choice of graph coarsening, we explore constructing deep neural networks for graph classification. In particular, we demonstrate the generality of our formulation in point cloud classification, where we set the new state of the art, and on a graph classification dataset, where we outperform other deep learning approaches. The source code is available at https://github.com/mys007/ecc
Tasks Graph Classification
Published 2017-04-10
URL http://arxiv.org/abs/1704.02901v3
PDF http://arxiv.org/pdf/1704.02901v3.pdf
PWC https://paperswithcode.com/paper/dynamic-edge-conditioned-filters-in
Repo https://github.com/mys007/ecc
Framework pytorch

Single Image Super-Resolution with Dilated Convolution based Multi-Scale Information Learning Inception Module

Title Single Image Super-Resolution with Dilated Convolution based Multi-Scale Information Learning Inception Module
Authors Wuzhen Shi, Feng Jiang, Debin Zhao
Abstract Traditional works have shown that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales. Make full use of these multi-scale information can improve the image restoration performance. However, the current proposed deep learning based restoration methods do not take the multi-scale information into account. In this paper, we propose a dilated convolution based inception module to learn multi-scale information and design a deep network for single image super-resolution. Different dilated convolution learns different scale feature, then the inception module concatenates all these features to fuse multi-scale information. In order to increase the reception field of our network to catch more contextual information, we cascade multiple inception modules to constitute a deep network to conduct single image super-resolution. With the novel dilated convolution based inception module, the proposed end-to-end single image super-resolution network can take advantage of multi-scale information to improve image super-resolution performance. Experimental results show that our proposed method outperforms many state-of-the-art single image super-resolution methods.
Tasks Image Restoration, Image Super-Resolution, Super-Resolution
Published 2017-07-22
URL http://arxiv.org/abs/1707.07128v1
PDF http://arxiv.org/pdf/1707.07128v1.pdf
PWC https://paperswithcode.com/paper/single-image-super-resolution-with-dilated
Repo https://github.com/wzhshi/MSSRNet
Framework none

Label-driven weakly-supervised learning for multimodal deformable image registration

Title Label-driven weakly-supervised learning for multimodal deformable image registration
Authors Yipeng Hu, Marc Modat, Eli Gibson, Nooshin Ghavami, Ester Bonmati, Caroline M. Moore, Mark Emberton, J. Alison Noble, Dean C. Barratt, Tom Vercauteren
Abstract Spatially aligning medical images from different modalities remains a challenging task, especially for intraoperative applications that require fast and robust algorithms. We propose a weakly-supervised, label-driven formulation for learning 3D voxel correspondence from higher-level label correspondence, thereby bypassing classical intensity-based image similarity measures. During training, a convolutional neural network is optimised by outputting a dense displacement field (DDF) that warps a set of available anatomical labels from the moving image to match their corresponding counterparts in the fixed image. These label pairs, including solid organs, ducts, vessels, point landmarks and other ad hoc structures, are only required at training time and can be spatially aligned by minimising a cross-entropy function of the warped moving label and the fixed label. During inference, the trained network takes a new image pair to predict an optimal DDF, resulting in a fully-automatic, label-free, real-time and deformable registration. For interventional applications where large global transformation prevails, we also propose a neural network architecture to jointly optimise the global- and local displacements. Experiment results are presented based on cross-validating registrations of 111 pairs of T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients with a total of over 4000 anatomical labels, yielding a median target registration error of 4.2 mm on landmark centroids and a median Dice of 0.88 on prostate glands.
Tasks Image Registration
Published 2017-11-05
URL http://arxiv.org/abs/1711.01666v2
PDF http://arxiv.org/pdf/1711.01666v2.pdf
PWC https://paperswithcode.com/paper/label-driven-weakly-supervised-learning-for
Repo https://github.com/Duoduo-Qian/Medical-image-registration-Resources
Framework pytorch
comments powered by Disqus