January 27, 2020

3418 words 17 mins read

Paper Group ANR 1301

Towards Making Deep Transfer Learning Never Hurt. Hypergraph Partitioning With Embeddings. Unsupervised Adversarial Image Inpainting. Kernel and Rich Regimes in Overparametrized Models. Real-Time Quality Assessment of Pediatric MRI via Semi-Supervised Deep Nonlocal Residual Neural Networks. Recurrent Instance Segmentation using Sequences of Referri …

Towards Making Deep Transfer Learning Never Hurt


Title	Towards Making Deep Transfer Learning Never Hurt
Authors	Ruosi Wan, Haoyi Xiong, Xingjian Li, Zhanxing Zhu, Jun Huan
Abstract	Transfer learning have been frequently used to improve deep neural network training through incorporating weights of pre-trained networks as the starting-point of optimization for regularization. While deep transfer learning can usually boost the performance with better accuracy and faster convergence, transferring weights from inappropriate networks hurts training procedure and may lead to even lower accuracy. In this paper, we consider deep transfer learning as minimizing a linear combination of empirical loss and regularizer based on pre-trained weights, where the regularizer would restrict the training procedure from lowering the empirical loss, with conflicted descent directions (e.g., derivatives). Following the view, we propose a novel strategy making regularization-based Deep Transfer learning Never Hurt (DTNH) that, for each iteration of training procedure, computes the derivatives of the two terms separately, then re-estimates a new descent direction that does not hurt the empirical loss minimization while preserving the regularization affects from the pre-trained weights. Extensive experiments have been done using common transfer learning regularizers, such as L2-SP and knowledge distillation, on top of a wide range of deep transfer learning benchmarks including Caltech, MIT indoor 67, CIFAR-10 and ImageNet. The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks. All in all, DTNH strategy can improve state-of-the-art regularizers in all cases with 0.1%–7% higher accuracy in all experiments.
Tasks	Transfer Learning
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07489v1
PDF	https://arxiv.org/pdf/1911.07489v1.pdf
PWC	https://paperswithcode.com/paper/towards-making-deep-transfer-learning-never
Repo
Framework

Hypergraph Partitioning With Embeddings


Title	Hypergraph Partitioning With Embeddings
Authors	Justin Sybrandt, Ruslan Shaydulin, Ilya Safro
Abstract	Problems in scientific computing, such as distributing large sparse matrix operations, have analogous formulations as hypergraph partitioning problems. A hypergraph is a generalization of a traditional graph wherein “hyperedges” may connect any number of nodes. As a result, hypergraph partitioning is an NP-Hard problem to both solve or approximate. State-of-the-art algorithms that solve this problem follow the multilevel paradigm, which begins by iteratively “coarsening” the input hypergraph to smaller problem instances that share key structural features. Once identifying an approximate problem that is small enough to be solved directly, that solution can be interpolated and refined to the original problem. While this strategy represents an excellent trade off between quality and running time, it is sensitive to coarsening strategy. In this work we propose using graph embeddings of the initial hypergraph in order to ensure that coarsened problem instances retrain key structural features. Our approach prioritizes coarsening within self-similar regions within the input graph, and leads to significantly improved solution quality across a range of considered hypergraphs. Reproducibility: All source code, plots and experimental data are available at https://sybrandt.com/2019/partition.
Tasks	hypergraph partitioning
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04016v4
PDF	https://arxiv.org/pdf/1909.04016v4.pdf
PWC	https://paperswithcode.com/paper/partition-hypergraphs-with-embeddings
Repo
Framework

Unsupervised Adversarial Image Inpainting


Title	Unsupervised Adversarial Image Inpainting
Authors	Arthur Pajot, Emmanuel de Bezenac, Patrick Gallinari
Abstract	We consider inpainting in an unsupervised setting where there is neither access to paired nor unpaired training data. The only available information is provided by the uncomplete observations and the inpainting process statistics. In this context, an observation should give rise to several plausible reconstructions which amounts at learning a distribution over the space of reconstructed images. We model the reconstruction process by using a conditional GAN with constraints on the stochastic component that introduce an explicit dependency between this component and the generated output. This allows us sampling from the latent component in order to generate a distribution of images associated to an observation. We demonstrate the capacity of our model on several image datasets: faces (CelebA), food images (Recipe-1M) and bedrooms (LSUN Bedrooms) with different types of imputation masks. The approach yields comparable performance to model variants trained with additional supervision.
Tasks	Image Inpainting, Imputation
Published	2019-12-18
URL	https://arxiv.org/abs/1912.12164v1
PDF	https://arxiv.org/pdf/1912.12164v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-adversarial-image-inpainting
Repo
Framework

Kernel and Rich Regimes in Overparametrized Models


Title	Kernel and Rich Regimes in Overparametrized Models
Authors	Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro
Abstract	A recent line of work studies overparametrized neural networks in the “kernel regime,” i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach, we show how the scale of the initialization controls the transition between the “kernel” (aka lazy) and “rich” (aka active) regimes and affects generalization properties in multilayer homogeneous models. We provide a complete and detailed analysis for a simple two-layer model that already exhibits an interesting and meaningful transition between the kernel and rich regimes, and we demonstrate the transition for more complex matrix factorization models and multilayer non-linear networks.
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05827v3
PDF	https://arxiv.org/pdf/1906.05827v3.pdf
PWC	https://paperswithcode.com/paper/kernel-and-deep-regimes-in-overparametrized
Repo
Framework

Real-Time Quality Assessment of Pediatric MRI via Semi-Supervised Deep Nonlocal Residual Neural Networks


Title	Real-Time Quality Assessment of Pediatric MRI via Semi-Supervised Deep Nonlocal Residual Neural Networks
Authors	Siyuan Liu, Kim-Han Thung, Weili Lin, Pew-Thian Yap, Dinggang~Shen
Abstract	In this paper, we introduce an image quality assessment (IQA) method for pediatric T1- and T2-weighted MR images. IQA is first performed slice-wise using a nonlocal residual neural network (NR-Net) and then volume-wise by agglomerating the slice QA results using random forest. Our method requires only a small amount of quality-annotated images for training and is designed to be robust to annotation noise that might occur due to rater errors and the inevitable mix of good and bad slices in an image volume. Using a small set of quality-assessed images, we pre-train NR-Net to annotate each image slice with an initial quality rating (i.e., pass, questionable, fail), which we then refine by semi-supervised learning and iterative self-training. Experimental results demonstrate that our method, trained using only samples of modest size, exhibit great generalizability, capable of real-time (milliseconds per volume) large-scale IQA with near-perfect accuracy.
Tasks	Image Quality Assessment
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03639v1
PDF	http://arxiv.org/pdf/1904.03639v1.pdf
PWC	https://paperswithcode.com/paper/real-time-quality-assessment-of-pediatric-mri
Repo
Framework

Recurrent Instance Segmentation using Sequences of Referring Expressions


Title	Recurrent Instance Segmentation using Sequences of Referring Expressions
Authors	Alba Herrera-Palacio, Carles Ventura, Carina Silberer, Ionut-Teodor Sorodoc, Gemma Boleda, Xavier Giro-i-Nieto
Abstract	The goal of this work is to segment the objects in an image that are referred to by a sequence of linguistic descriptions (referring expressions). We propose a deep neural network with recurrent layers that output a sequence of binary masks, one for each referring expression provided by the user. The recurrent layers in the architecture allow the model to condition each predicted mask on the previous ones, from a spatial perspective within the same image. Our multimodal approach uses off-the-shelf architectures to encode both the image and the referring expressions. The visual branch provides a tensor of pixel embeddings that are concatenated with the phrase embeddings produced by a language encoder. Our experiments on the RefCOCO dataset for still images indicate how the proposed architecture successfully exploits the sequences of referring expressions to solve a pixel-wise task of instance segmentation.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-11-05
URL	https://arxiv.org/abs/1911.02103v1
PDF	https://arxiv.org/pdf/1911.02103v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-instance-segmentation-using
Repo
Framework

Image Processing Using Multi-Code GAN Prior


Title	Image Processing Using Multi-Code GAN Prior
Authors	Jinjin Gu, Yujun Shen, Bolei Zhou
Abstract	Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.
Tasks	Colorization, Image Generation, Image Inpainting, Image Reconstruction, Super-Resolution
Published	2019-12-15
URL	https://arxiv.org/abs/1912.07116v2
PDF	https://arxiv.org/pdf/1912.07116v2.pdf
PWC	https://paperswithcode.com/paper/image-processing-using-multi-code-gan-prior
Repo
Framework

Geomorphological Analysis Using Unpiloted Aircraft Systems, Structure from Motion, and Deep Learning


Title	Geomorphological Analysis Using Unpiloted Aircraft Systems, Structure from Motion, and Deep Learning
Authors	Zhiang Chen, Tyler R. Scott, Sarah Bearman, Harish Anand, Devin Keating, Chelsea Scott, J Ramon Arrowsmith, Jnaneshwar Das
Abstract	We present a pipeline for geomorphological analysis that uses structure from motion (SfM) and deep learning on close-range aerial imagery to estimate spatial distributions of rock traits (diameter, size, and orientation) along a tectonic fault scarp. Unpiloted aircraft systems (UAS) have enabled acquisition of high-resolution imagery at close range, revolutionizing domains such as infrastructure inspection, precision agriculture, and disaster response. Our pipeline leverages UAS-based imagery to help scientists gain a better understanding of tectonic surface processes. We start by using SfM on aerial imagery to produce georeferenced orthomosaics and digital elevation models (DEM), then a human expert annotates rocks on a set of image tiles sampled from the orthomosaics. These annotations are used to train a deep neural network to detect and segment individual rocks in the whole site. This pipeline automatically extracts semantic information (rock boundaries) on large volumes of unlabeled, high-resolution aerial imagery, which allows subsequent structural analysis and shape descriptors to result in estimates of rock diameter, size and orientation. We present results of two experiments conducted along a fault scarp in the Volcanic Tablelands near Bishop, California. We conducted the first experiment with a hexrotor and a multispectral camera to produce a DEM and five spectral orthomosaics in red, green, blue, red edge (RE), and near infrared (NIR). We then trained deep neural networks with different input channel combinations to study the most effective learning method for inference. In the second experiment, we deployed a DJI Phantom 4 Pro equipped with an RGB camera, and focused on the spatial difference of rock-trait histograms in a larger area. Although presented in the context of geology, our pipeline can be extended to a variety of geomorphological analysis tasks in other domains.
Tasks	Morphological Analysis
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12874v2
PDF	https://arxiv.org/pdf/1909.12874v2.pdf
PWC	https://paperswithcode.com/paper/geomorphological-analysis-using-unpiloted
Repo
Framework

BTEL: A Binary Tree Encoding Approach for Visual Localization


Title	BTEL: A Binary Tree Encoding Approach for Visual Localization
Authors	Huu Le, Tuan Hoang, Michael Milford
Abstract	Visual localization algorithms have achieved significant improvements in performance thanks to recent advances in camera technology and vision-based techniques. However, there remains one critical caveat: all current approaches that are based on image retrieval currently scale at best linearly with the size of the environment with respect to both storage, and consequentially in most approaches, query time. This limitation severely curtails the capability of autonomous systems in a wide range of compute, power, storage, size, weight or cost constrained applications such as drones. In this work, we present a novel binary tree encoding approach for visual localization which can serve as an alternative for existing quantization and indexing techniques. The proposed tree structure allows us to derive a compressed training scheme that achieves sub-linearity in both required storage and inference time. The encoding memory can be easily configured to satisfy different storage constraints. Moreover, our approach is amenable to an optional sequence filtering mechanism to further improve the localization results, while maintaining the same amount of storage. Our system is entirely agnostic to the front-end descriptors, allowing it to be used on top of recent state-of-the-art image representations. Experimental results show that the proposed method significantly outperforms state-of-the-art approaches under limited storage constraints.
Tasks	Image Retrieval, Quantization, Visual Localization
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11992v1
PDF	https://arxiv.org/pdf/1906.11992v1.pdf
PWC	https://paperswithcode.com/paper/btel-a-binary-tree-encoding-approach-for
Repo
Framework

Solo or Ensemble? Choosing a CNN Architecture for Melanoma Classification


Title	Solo or Ensemble? Choosing a CNN Architecture for Melanoma Classification
Authors	Fábio Perez, Sandra Avila, Eduardo Valle
Abstract	Convolutional neural networks (CNNs) deliver exceptional results for computer vision, including medical image analysis. With the growing number of available architectures, picking one over another is far from obvious. Existing art suggests that, when performing transfer learning, the performance of CNN architectures on ImageNet correlates strongly with their performance on target tasks. We evaluate that claim for melanoma classification, over 9 CNNs architectures, in 5 sets of splits created on the ISIC Challenge 2017 dataset, and 3 repeated measures, resulting in 135 models. The correlations we found were, to begin with, much smaller than those reported by existing art, and disappeared altogether when we considered only the top-performing networks: uncontrolled nuisances (i.e., splits and randomness) overcome any of the analyzed factors. Whenever possible, the best approach for melanoma classification is still to create ensembles of multiple models. We compared two choices for selecting which models to ensemble: picking them at random (among a pool of high-quality ones) vs. using the validation set to determine which ones to pick first. For small ensembles, we found a slight advantage on the second approach but found that random choice was also competitive. Although our aim in this paper was not to maximize performance, we easily reached AUCs comparable to the first place on the ISIC Challenge 2017.
Tasks	Transfer Learning
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12724v1
PDF	http://arxiv.org/pdf/1904.12724v1.pdf
PWC	https://paperswithcode.com/paper/solo-or-ensemble-choosing-a-cnn-architecture
Repo
Framework

Bounded Manifold Completion


Title	Bounded Manifold Completion
Authors	Kelum Gajamannage, Randy Paffenroth
Abstract	Nonlinear dimensionality reduction or, equivalently, the approximation of high-dimensional data using a low-dimensional nonlinear manifold is an active area of research. In this paper, we will present a thematically different approach to detect the existence of a low-dimensional manifold of a given dimension that lies within a set of bounds derived from a given point cloud. A matrix representing the appropriately defined distances on a low-dimensional manifold is low-rank, and our method is based on current techniques for recovering a partially observed matrix from a small set of fully observed entries that can be implemented as a low-rank Matrix Completion (MC) problem. MC methods are currently used to solve challenging real-world problems, such as image inpainting and recommender systems, and we leverage extent efficient optimization techniques that use a nuclear norm convex relaxation as a surrogate for non-convex and discontinuous rank minimization. Our proposed method provides several advantages over current nonlinear dimensionality reduction techniques, with the two most important being theoretical guarantees on the detection of low-dimensional embeddings and robustness to non-uniformity in the sampling of the manifold. We validate the performance of this approach using both a theoretical analysis as well as synthetic and real-world benchmark datasets.
Tasks	Dimensionality Reduction, Image Inpainting, Low-Rank Matrix Completion, Matrix Completion, Recommendation Systems
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09026v1
PDF	https://arxiv.org/pdf/1912.09026v1.pdf
PWC	https://paperswithcode.com/paper/bounded-manifold-completion
Repo
Framework

Data-Pooling in Stochastic Optimization


Title	Data-Pooling in Stochastic Optimization
Authors	Vishal Gupta, Nathan Kallus
Abstract	Managing large-scale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel data-pooling algorithm called Shrunken-SAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly non-convex, non-smooth optimization problems such as vehicle-routing, economic lot-sizing or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein’s Phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when} and why data-pooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that data-pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of data-pooling using real data from a chain of retail drug stores in the context of inventory management.
Tasks	Stochastic Optimization
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00255v1
PDF	https://arxiv.org/pdf/1906.00255v1.pdf
PWC	https://paperswithcode.com/paper/190600255
Repo
Framework

Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition


Title	Landmarks-assisted Collaborative Deep Framework for Automatic 4D Facial Expression Recognition
Authors	Muzammil Behzad, Nhat Vo, Xiaobai Li, Guoying Zhao
Abstract	We propose a novel landmarks-assisted collaborative end-to-end deep framework for automatic 4D FER. Using 4D face scan data, we calculate its various geometrical images, and afterwards use rank pooling to generate their dynamic images encapsulating important facial muscle movements over time. As well, the given 3D landmarks are projected on a 2D plane as binary images and convolutional layers are used to extract sequences of feature vectors for every landmark video. During the training stage, the dynamic images are used to train an end-to-end deep network, while the feature vectors of landmark images are used train a long short-term memory (LSTM) network. The finally improved set of expression predictions are obtained when the dynamic and landmark images collaborate over multi-views using the proposed deep framework. Performance results obtained from extensive experimentation on the widely-adopted BU-4DFE database under globally used settings prove that our proposed collaborative framework outperforms the state-of-the-art 4D FER methods and reach a promising classification accuracy of 96.7% demonstrating its effectiveness.
Tasks	Facial Expression Recognition
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05445v2
PDF	https://arxiv.org/pdf/1910.05445v2.pdf
PWC	https://paperswithcode.com/paper/landmarks-assisted-collaborative-deep
Repo
Framework

Image Retrieval and Pattern Spotting using Siamese Neural Network


Title	Image Retrieval and Pattern Spotting using Siamese Neural Network
Authors	Kelly L. Wiggers, Alceu S. Britto Jr., Laurent Heutte, Alessandro L. Koerich, Luiz S. Oliveira
Abstract	This paper presents a novel approach for image retrieval and pattern spotting in document image collections. The manual feature engineering is avoided by learning a similarity-based representation using a Siamese Neural Network trained on a previously prepared subset of image pairs from the ImageNet dataset. The learned representation is used to provide the similarity-based feature maps used to find relevant image candidates in the data collection given an image query. A robust experimental protocol based on the public Tobacco800 document image collection shows that the proposed method compares favorably against state-of-the-art document image retrieval methods, reaching 0.94 and 0.83 of mean average precision (mAP) for retrieval and pattern spotting (IoU=0.7), respectively. Besides, we have evaluated the proposed method considering feature maps of different sizes, showing the impact of reducing the number of features in the retrieval performance and time-consuming.
Tasks	Feature Engineering, Image Retrieval
Published	2019-06-22
URL	https://arxiv.org/abs/1906.09513v1
PDF	https://arxiv.org/pdf/1906.09513v1.pdf
PWC	https://paperswithcode.com/paper/image-retrieval-and-pattern-spotting-using
Repo
Framework

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges


Title	Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
Authors	Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, Yonghui Wu
Abstract	We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair. We set a milestone towards this goal by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples. Our system demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines. We provide in-depth analysis of various aspects of model building that are crucial to achieving quality and practicality in universal NMT. While we prototype a high-quality universal translation system, our extensive empirical analysis exposes issues that need to be further addressed, and we suggest directions for future research.
Tasks	Machine Translation, Transfer Learning
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05019v1
PDF	https://arxiv.org/pdf/1907.05019v1.pdf
PWC	https://paperswithcode.com/paper/massively-multilingual-neural-machine-2
Repo
Framework