July 29, 2019

3037 words 15 mins read

Paper Group AWR 192

Robust Registration of Gaussian Mixtures for Colour Transfer. Transfer Learning for Speech Recognition on a Budget. Get To The Point: Summarization with Pointer-Generator Networks. Eye In-Painting with Exemplar Generative Adversarial Networks. Progressive Learning for Systematic Design of Large Neural Networks. Learning to Find Good Correspondences …

Robust Registration of Gaussian Mixtures for Colour Transfer


Title	Robust Registration of Gaussian Mixtures for Colour Transfer
Authors	Mairéad Grogan, Rozenn Dahyot
Abstract	We present a flexible approach to colour transfer inspired by techniques recently proposed for shape registration. Colour distributions of the palette and target images are modelled with Gaussian Mixture Models (GMMs) that are robustly registered to infer a non linear parametric transfer function. We show experimentally that our approach compares well to current techniques both quantitatively and qualitatively. Moreover, our technique is computationally the fastest and can take efficient advantage of parallel processing architectures for recolouring images and videos. Our transfer function is parametric and hence can be stored in memory for later usage and also combined with other computed transfer functions to create interesting visual effects. Overall this paper provides a fast user friendly approach to recolouring of image and video materials.
Tasks
Published	2017-05-17
URL	http://arxiv.org/abs/1705.06091v1
PDF	http://arxiv.org/pdf/1705.06091v1.pdf
PWC	https://paperswithcode.com/paper/robust-registration-of-gaussian-mixtures-for
Repo	https://github.com/V-Sense/LFToolbox_Recolouring_HPR
Framework	none

Transfer Learning for Speech Recognition on a Budget


Title	Transfer Learning for Speech Recognition on a Budget
Authors	Julius Kunze, Louis Kirsch, Ilia Kurenkov, Andreas Krug, Jens Johannsmeier, Sebastian Stober
Abstract	End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network’s weights were sufficient for good performance, especially for inner layers.
Tasks	Speech Recognition, Transfer Learning
Published	2017-06-01
URL	http://arxiv.org/abs/1706.00290v1
PDF	http://arxiv.org/pdf/1706.00290v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-speech-recognition-on-a
Repo	https://github.com/transfer-learning-asr/transfer-learning-asr
Framework	tf

Get To The Point: Summarization with Pointer-Generator Networks


Title	Get To The Point: Summarization with Pointer-Generator Networks
Authors	Abigail See, Peter J. Liu, Christopher D. Manning
Abstract	Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. In this work we propose a novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways. First, we use a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Second, we use coverage to keep track of what has been summarized, which discourages repetition. We apply our model to the CNN / Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 2 ROUGE points.
Tasks	Abstractive Text Summarization, Text Summarization
Published	2017-04-14
URL	http://arxiv.org/abs/1704.04368v2
PDF	http://arxiv.org/pdf/1704.04368v2.pdf
PWC	https://paperswithcode.com/paper/get-to-the-point-summarization-with-pointer
Repo	https://github.com/sblayush/summarization
Framework	tf

Eye In-Painting with Exemplar Generative Adversarial Networks


Title	Eye In-Painting with Exemplar Generative Adversarial Networks
Authors	Brian Dolhansky, Cristian Canton Ferrer
Abstract	This paper introduces a novel approach to in-painting where the identity of the object to remove or change is preserved and accounted for at inference time: Exemplar GANs (ExGANs). ExGANs are a type of conditional GAN that utilize exemplar information to produce high-quality, personalized in painting results. We propose using exemplar information in the form of a reference image of the region to in-paint, or a perceptual code describing that object. Unlike previous conditional GAN formulations, this extra information can be inserted at multiple points within the adversarial network, thus increasing its descriptive power. We show that ExGANs can produce photo-realistic personalized in-painting results that are both perceptually and semantically plausible by applying them to the task of closed to-open eye in-painting in natural pictures. A new benchmark dataset is also introduced for the task of eye in-painting for future comparisons.
Tasks
Published	2017-12-11
URL	http://arxiv.org/abs/1712.03999v1
PDF	http://arxiv.org/pdf/1712.03999v1.pdf
PWC	https://paperswithcode.com/paper/eye-in-painting-with-exemplar-generative
Repo	https://github.com/bdol/exemplar_gans
Framework	none

Progressive Learning for Systematic Design of Large Neural Networks


Title	Progressive Learning for Systematic Design of Large Neural Networks
Authors	Saikat Chatterjee, Alireza M. Javid, Mostafa Sadeghi, Partha P. Mitra, Mikael Skoglund
Abstract	We develop an algorithm for systematic design of a large artificial neural network using a progression property. We find that some non-linear functions, such as the rectifier linear unit and its derivatives, hold the property. The systematic design addresses the choice of network size and regularization of parameters. The number of nodes and layers in network increases in progression with the objective of consistently reducing an appropriate cost. Each layer is optimized at a time, where appropriate parameters are learned using convex optimization. Regularization parameters for convex optimization do not need a significant manual effort for tuning. We also use random instances for some weight matrices, and that helps to reduce the number of parameters we learn. The developed network is expected to show good generalization power due to appropriate regularization and use of random weights in the layers. This expectation is verified by extensive experiments for classification and regression problems, using standard databases.
Tasks
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08177v1
PDF	http://arxiv.org/pdf/1710.08177v1.pdf
PWC	https://paperswithcode.com/paper/progressive-learning-for-systematic-design-of
Repo	https://github.com/viebboy/POPmem
Framework	none

Learning to Find Good Correspondences


Title	Learning to Find Good Correspondences
Authors	Kwang Moo Yi, Eduard Trulls, Yuki Ono, Vincent Lepetit, Mathieu Salzmann, Pascal Fua
Abstract	We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while imbuing it with global information, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data.
Tasks
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05971v2
PDF	http://arxiv.org/pdf/1711.05971v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-find-good-correspondences
Repo	https://github.com/vcg-uvic/image-matching-benchmark
Framework	none

Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme


Title	Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme
Authors	Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, Bo Xu
Abstract	Joint extraction of entities and relations is an important task in information extraction. To tackle this problem, we firstly propose a novel tagging scheme that can convert the joint extraction task to a tagging problem. Then, based on our tagging scheme, we study different end-to-end models to extract entities and their relations directly, without identifying entities and relations separately. We conduct experiments on a public dataset produced by distant supervision method and the experimental results show that the tagging based methods are better than most of the existing pipelined and joint learning methods. What’s more, the end-to-end model proposed in this paper, achieves the best results on the public dataset.
Tasks	Joint Entity and Relation Extraction, Relation Extraction
Published	2017-06-07
URL	http://arxiv.org/abs/1706.05075v1
PDF	http://arxiv.org/pdf/1706.05075v1.pdf
PWC	https://paperswithcode.com/paper/joint-extraction-of-entities-and-relations
Repo	https://github.com/kyzhouhzau/CCLNER
Framework	tf


Title	A Probabilistic Quality Representation Approach to Deep Blind Image Quality Prediction
Authors	Hui Zeng, Lei Zhang, Alan C. Bovik
Abstract	Blind image quality assessment (BIQA) remains a very challenging problem due to the unavailability of a reference image. Deep learning based BIQA methods have been attracting increasing attention in recent years, yet it remains a difficult task to train a robust deep BIQA model because of the very limited number of training samples with human subjective scores. Most existing methods learn a regression network to minimize the prediction error of a scalar image quality score. However, such a scheme ignores the fact that an image will receive divergent subjective scores from different subjects, which cannot be adequately represented by a single scalar number. This is particularly true on complex, real-world distorted images. Moreover, images may broadly differ in their distributions of assigned subjective scores. Recognizing this, we propose a new representation of perceptual image quality, called probabilistic quality representation (PQR), to describe the image subjective score distribution, whereby a more robust loss function can be employed to train a deep BIQA model. The proposed PQR method is shown to not only speed up the convergence of deep model training, but to also greatly improve the achievable level of quality prediction accuracy relative to scalar quality score regression methods. The source code is available at https://github.com/HuiZeng/BIQA_Toolbox.
Tasks	Blind Image Quality Assessment, Image Quality Assessment
Published	2017-08-28
URL	http://arxiv.org/abs/1708.08190v2
PDF	http://arxiv.org/pdf/1708.08190v2.pdf
PWC	https://paperswithcode.com/paper/a-probabilistic-quality-representation
Repo	https://github.com/HuiZeng/BIQA_Toolbox
Framework	none

Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation


Title	Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation
Authors	Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang
Abstract	Hand pose estimation from a single depth image is an essential topic in computer vision and human computer interaction. Despite recent advancements in this area promoted by convolutional neural network, accurate hand pose estimation is still a challenging problem. In this paper we propose a Pose guided structured Region Ensemble Network (Pose-REN) to boost the performance of hand pose estimation. The proposed method extracts regions from the feature maps of convolutional neural network under the guide of an initially estimated pose, generating more optimal and representative features for hand pose estimation. The extracted feature regions are then integrated hierarchically according to the topology of hand joints by employing tree-structured fully connections. A refined estimation of hand pose is directly regressed by the proposed network and the final hand pose is obtained by utilizing an iterative cascaded method. Comprehensive experiments on public hand pose datasets demonstrate that our proposed method outperforms state-of-the-art algorithms.
Tasks	Hand Pose Estimation, Pose Estimation
Published	2017-08-11
URL	http://arxiv.org/abs/1708.03416v2
PDF	http://arxiv.org/pdf/1708.03416v2.pdf
PWC	https://paperswithcode.com/paper/pose-guided-structured-region-ensemble
Repo	https://github.com/xinghaochen/Pose-REN
Framework	caffe2

Sparse Representation-based Open Set Recognition


Title	Sparse Representation-based Open Set Recognition
Authors	He Zhang, Vishal M. Patel
Abstract	We propose a generalized Sparse Representation- based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstruction error distributions, we model the tail of those two error distributions using the statistical Extreme Value Theory (EVT). Then we simplify the open set recognition problem into a set of hypothesis testing problems. The confidence scores corresponding to the tail distributions of a novel test sample are then fused to determine its identity. The effectiveness of the proposed method is demonstrated using four publicly available image and object classification datasets and it is shown that this method can perform significantly better than many competitive open set recognition algorithms. Code is public available: https://github.com/hezhangsprinter/SROSR
Tasks	Object Classification, Open Set Learning, Sparse Representation-based Classification
Published	2017-05-06
URL	http://arxiv.org/abs/1705.02431v1
PDF	http://arxiv.org/pdf/1705.02431v1.pdf
PWC	https://paperswithcode.com/paper/sparse-representation-based-open-set
Repo	https://github.com/hezhangsprinter/SROSR
Framework	none

Improved Regularization of Convolutional Neural Networks with Cutout


Title	Improved Regularization of Convolutional Neural Networks with Cutout
Authors	Terrance DeVries, Graham W. Taylor
Abstract	Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results of 2.56%, 15.20%, and 1.30% test error respectively. Code is available at https://github.com/uoguelph-mlrg/Cutout
Tasks	Data Augmentation, Image Augmentation, Image Classification, Semi-Supervised Image Classification
Published	2017-08-15
URL	http://arxiv.org/abs/1708.04552v2
PDF	http://arxiv.org/pdf/1708.04552v2.pdf
PWC	https://paperswithcode.com/paper/improved-regularization-of-convolutional
Repo	https://github.com/uoguelph-mlrg/Cutout
Framework	pytorch

Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence


Title	Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence
Authors	Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin
Abstract	Knowledge graphs (KGs), which could provide essential relational information between entities, have been widely utilized in various knowledge-driven applications. Since the overall human knowledge is innumerable that still grows explosively and changes frequently, knowledge construction and update inevitably involve automatic mechanisms with less human supervision, which usually bring in plenty of noises and conflicts to KGs. However, most conventional knowledge representation learning methods assume that all triple facts in existing KGs share the same significance without any noises. To address this problem, we propose a novel confidence-aware knowledge representation learning framework (CKRL), which detects possible noises in KGs while learning knowledge representations with confidence simultaneously. Specifically, we introduce the triple confidence to conventional translation-based methods for knowledge representation learning. To make triple confidence more flexible and universal, we only utilize the internal structural information in KGs, and propose three kinds of triple confidences considering both local and global structural information. In experiments, We evaluate our models on knowledge graph noise detection, knowledge graph completion and triple classification. Experimental results demonstrate that our confidence-aware models achieve significant and consistent improvements on all tasks, which confirms the capability of CKRL modeling confidence with structural information in both KG noise detection and knowledge representation learning.
Tasks	Knowledge Graph Completion, Knowledge Graphs, Representation Learning
Published	2017-05-09
URL	http://arxiv.org/abs/1705.03202v2
PDF	http://arxiv.org/pdf/1705.03202v2.pdf
PWC	https://paperswithcode.com/paper/does-william-shakespeare-really-write-hamlet
Repo	https://github.com/thunlp/CKRL
Framework	none

Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs


Title	Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
Authors	Martin Simonovsky, Nikos Komodakis
Abstract	A number of problems can be formulated as prediction on graph-structured data. In this work, we generalize the convolution operator from regular grids to arbitrary graphs while avoiding the spectral domain, which allows us to handle graphs of varying size and connectivity. To move beyond a simple diffusion, filter weights are conditioned on the specific edge labels in the neighborhood of a vertex. Together with the proper choice of graph coarsening, we explore constructing deep neural networks for graph classification. In particular, we demonstrate the generality of our formulation in point cloud classification, where we set the new state of the art, and on a graph classification dataset, where we outperform other deep learning approaches. The source code is available at https://github.com/mys007/ecc
Tasks	Graph Classification
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02901v3
PDF	http://arxiv.org/pdf/1704.02901v3.pdf
PWC	https://paperswithcode.com/paper/dynamic-edge-conditioned-filters-in
Repo	https://github.com/mys007/ecc
Framework	pytorch

Single Image Super-Resolution with Dilated Convolution based Multi-Scale Information Learning Inception Module


Title	Single Image Super-Resolution with Dilated Convolution based Multi-Scale Information Learning Inception Module
Authors	Wuzhen Shi, Feng Jiang, Debin Zhao
Abstract	Traditional works have shown that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales. Make full use of these multi-scale information can improve the image restoration performance. However, the current proposed deep learning based restoration methods do not take the multi-scale information into account. In this paper, we propose a dilated convolution based inception module to learn multi-scale information and design a deep network for single image super-resolution. Different dilated convolution learns different scale feature, then the inception module concatenates all these features to fuse multi-scale information. In order to increase the reception field of our network to catch more contextual information, we cascade multiple inception modules to constitute a deep network to conduct single image super-resolution. With the novel dilated convolution based inception module, the proposed end-to-end single image super-resolution network can take advantage of multi-scale information to improve image super-resolution performance. Experimental results show that our proposed method outperforms many state-of-the-art single image super-resolution methods.
Tasks	Image Restoration, Image Super-Resolution, Super-Resolution
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07128v1
PDF	http://arxiv.org/pdf/1707.07128v1.pdf
PWC	https://paperswithcode.com/paper/single-image-super-resolution-with-dilated
Repo	https://github.com/wzhshi/MSSRNet
Framework	none

Label-driven weakly-supervised learning for multimodal deformable image registration


Title	Label-driven weakly-supervised learning for multimodal deformable image registration
Authors	Yipeng Hu, Marc Modat, Eli Gibson, Nooshin Ghavami, Ester Bonmati, Caroline M. Moore, Mark Emberton, J. Alison Noble, Dean C. Barratt, Tom Vercauteren
Abstract	Spatially aligning medical images from different modalities remains a challenging task, especially for intraoperative applications that require fast and robust algorithms. We propose a weakly-supervised, label-driven formulation for learning 3D voxel correspondence from higher-level label correspondence, thereby bypassing classical intensity-based image similarity measures. During training, a convolutional neural network is optimised by outputting a dense displacement field (DDF) that warps a set of available anatomical labels from the moving image to match their corresponding counterparts in the fixed image. These label pairs, including solid organs, ducts, vessels, point landmarks and other ad hoc structures, are only required at training time and can be spatially aligned by minimising a cross-entropy function of the warped moving label and the fixed label. During inference, the trained network takes a new image pair to predict an optimal DDF, resulting in a fully-automatic, label-free, real-time and deformable registration. For interventional applications where large global transformation prevails, we also propose a neural network architecture to jointly optimise the global- and local displacements. Experiment results are presented based on cross-validating registrations of 111 pairs of T2-weighted magnetic resonance images and 3D transrectal ultrasound images from prostate cancer patients with a total of over 4000 anatomical labels, yielding a median target registration error of 4.2 mm on landmark centroids and a median Dice of 0.88 on prostate glands.
Tasks	Image Registration
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01666v2
PDF	http://arxiv.org/pdf/1711.01666v2.pdf
PWC	https://paperswithcode.com/paper/label-driven-weakly-supervised-learning-for
Repo	https://github.com/Duoduo-Qian/Medical-image-registration-Resources
Framework	pytorch