Paper Group AWR 190
Deep-learning inversion: a next generation seismic velocity-model building method. Localizing dexterous surgical tools in X-ray for image-based navigation. Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation. An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models. …
Deep-learning inversion: a next generation seismic velocity-model building method
Title | Deep-learning inversion: a next generation seismic velocity-model building method |
Authors | Fangshu Yang, Jianwei Ma |
Abstract | Seismic velocity is one of the most important parameters used in seismic exploration. Accurate velocity models are key prerequisites for reverse-time migration and other high-resolution seismic imaging techniques. Such velocity information has traditionally been derived by tomography or full-waveform inversion (FWI), which are time consuming and computationally expensive, and they rely heavily on human interaction and quality control. We investigate a novel method based on the supervised deep fully convolutional neural network (FCN) for velocity-model building (VMB) directly from raw seismograms. Unlike the conventional inversion method based on physical models, the supervised deep-learning methods are based on big-data training rather than prior-knowledge assumptions. During the training stage, the network establishes a nonlinear projection from the multi-shot seismic data to the corresponding velocity models. During the prediction stage, the trained network can be used to estimate the velocity models from the new input seismic data. One key characteristic of the deep-learning method is that it can automatically extract multi-layer useful features without the need for human-curated activities and initial velocity setup. The data-driven method usually requires more time during the training stage, and actual predictions take less time, with only seconds needed. Therefore, the computational time of geophysical inversions, including real-time inversions, can be dramatically reduced once a good generalized network is built. By using numerical experiments on synthetic models, the promising performances of our proposed method are shown in comparison with conventional FWI even when the input data are in more realistic scenarios. Discussions on the deep-learning methods, training dataset, lack of low frequencies, and advantages and disadvantages of the new method are also provided. |
Tasks | |
Published | 2019-02-17 |
URL | http://arxiv.org/abs/1902.06267v1 |
http://arxiv.org/pdf/1902.06267v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-inversion-a-next-generation |
Repo | https://github.com/YangFangShu/FCNVMB-Deep-learning-based-seismic-velocity-model-building |
Framework | pytorch |
Localizing dexterous surgical tools in X-ray for image-based navigation
Title | Localizing dexterous surgical tools in X-ray for image-based navigation |
Authors | Cong Gao, Mathias Unberath, Russell Taylor, Mehran Armand |
Abstract | X-ray image based surgical tool navigation is fast and supplies accurate images of deep seated structures. Typically, recovering the 6 DOF rigid pose and deformation of tools with respect to the X-ray camera can be accurately achieved through intensity-based 2D/3D registration of 3D images or models to 2D X-rays. However, the capture range of image-based 2D/3D registration is inconveniently small suggesting that automatic and robust initialization strategies are of critical importance. This manuscript describes a first step towards leveraging semantic information of the imaged object to initialize 2D/3D registration within the capture range of image-based registration by performing concurrent segmentation and localization of dexterous surgical tools in X-ray images. We presented a learning-based strategy to simultaneously localize and segment dexterous surgical tools in X-ray images and demonstrate promising performance on synthetic and ex vivo data. We currently investigate methods to use semantic information extracted by the proposed network to reliably and robustly initialize image-based 2D/3D registration. While image-based 2D/3D registration has been an obvious focus of the CAI community, robust initialization thereof (albeit critical) has largely been neglected. This manuscript discusses learning-based retrieval of semantic information on imaged-objects as a stepping stone for such initialization and may therefore be of interest to the IPCAI community. Since results are still preliminary and only focus on localization, we target the Long Abstract category. |
Tasks | |
Published | 2019-01-20 |
URL | https://arxiv.org/abs/1901.06672v2 |
https://arxiv.org/pdf/1901.06672v2.pdf | |
PWC | https://paperswithcode.com/paper/localizing-dexterous-surgical-tools-in-x-ray |
Repo | https://github.com/mathiasunberath/DeepDRR |
Framework | pytorch |
Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation
Title | Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation |
Authors | Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, Mahmoud Al-Ayyoub |
Abstract | In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF) and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models, which require language-dependent post-processing steps, unlike ours. Moreover, we show that diacritics in Arabic can be used to enhance the models of NLP tasks such as Machine Translation (MT) by proposing the Translation over Diacritization (ToD) approach. |
Tasks | Arabic Text Diacritization, Machine Translation |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03531v1 |
https://arxiv.org/pdf/1911.03531v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-arabic-text-diacritization-state-of-1 |
Repo | https://github.com/AliOsm/shakkelha |
Framework | none |
An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models
Title | An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models |
Authors | Alexandra Chronopoulou, Christos Baziotis, Alexandros Potamianos |
Abstract | A growing number of state-of-the-art transfer learning methods employ language models pretrained on large generic corpora. In this paper we present a conceptually simple and effective transfer learning approach that addresses the problem of catastrophic forgetting. Specifically, we combine the task-specific optimization function with an auxiliary language model objective, which is adjusted during the training process. This preserves language regularities captured by language models, while enabling sufficient adaptation for solving the target task. Our method does not require pretraining or finetuning separate components of the network and we train our models end-to-end in a single step. We present results on a variety of challenging affective and text classification tasks, surpassing well established transfer learning methods with greater level of complexity. |
Tasks | Language Modelling, Text Classification, Transfer Learning |
Published | 2019-02-27 |
URL | https://arxiv.org/abs/1902.10547v3 |
https://arxiv.org/pdf/1902.10547v3.pdf | |
PWC | https://paperswithcode.com/paper/an-embarrassingly-simple-approach-for |
Repo | https://github.com/alexandra-chron/siatl |
Framework | pytorch |
Unpaired Point Cloud Completion on Real Scans using Adversarial Training
Title | Unpaired Point Cloud Completion on Real Scans using Adversarial Training |
Authors | Xuelin Chen, Baoquan Chen, Niloy J. Mitra |
Abstract | As 3D scanning solutions become increasingly popular, several deep learning setups have been developed geared towards that task of scan completion, i.e., plausibly filling in regions there were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have been successfully demonstrated on synthetic data, the approaches cannot be directly used on real scans in absence of suitable paired training data. We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. We evaluate the approach qualitatively on several real-world datasets (ScanNet, Matterport, KITTI), quantitatively on 3D-EPN shape completion benchmark dataset, and demonstrate realistic completions under varying levels of incompleteness. |
Tasks | |
Published | 2019-03-29 |
URL | https://arxiv.org/abs/1904.00069v3 |
https://arxiv.org/pdf/1904.00069v3.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-point-cloud-completion-on-real-scans |
Repo | https://github.com/ChenXuelinCXL/pcl2pcl-gan-pub |
Framework | tf |
Content and Colour Distillation for Learning Image Translations with the Spatial Profile Loss
Title | Content and Colour Distillation for Learning Image Translations with the Spatial Profile Loss |
Authors | M. Saquib Sarfraz, Constantin Seibold, Haroon Khalid, Rainer Stiefelhagen |
Abstract | Generative adversarial networks has emerged as a defacto standard for image translation problems. To successfully drive such models, one has to rely on additional networks e.g., discriminators and/or perceptual networks. Training these networks with pixel based losses alone are generally not sufficient to learn the target distribution. In this paper, we propose a novel method of computing the loss directly between the source and target images that enable proper distillation of shape/content and colour/style. We show that this is useful in typical image-to-image translations allowing us to successfully drive the generator without relying on additional networks. We demonstrate this on many difficult image translation problems such as image-to-image domain mapping, single image super-resolution and photo realistic makeup transfer. Our extensive evaluation shows the effectiveness of the proposed formulation and its ability to synthesize realistic images. [Code release: https://github.com/ssarfraz/SPL] |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00274v1 |
https://arxiv.org/pdf/1908.00274v1.pdf | |
PWC | https://paperswithcode.com/paper/content-and-colour-distillation-for-learning |
Repo | https://github.com/ssarfraz/SPL |
Framework | tf |
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
Title | Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images |
Authors | Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang |
Abstract | Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method. |
Tasks | 3D Object Reconstruction, 3D Reconstruction |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1901.11153v2 |
https://arxiv.org/pdf/1901.11153v2.pdf | |
PWC | https://paperswithcode.com/paper/pix2vox-context-aware-3d-reconstruction-from |
Repo | https://github.com/Ajithbalakrishnan/3D-Model-Reconstruction |
Framework | tf |
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
Title | Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks |
Authors | Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak |
Abstract | Modern neural networks are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Such neural networks in principle have the capacity to (over)fit any set of labels including pure noise. Despite this, somewhat paradoxically, neural network models trained via first-order methods continue to predict well on yet unseen test data. This paper takes a step towards demystifying this phenomena. Under a rich dataset model, we show that gradient descent is provably robust to noise/corruption on a constant fraction of the labels despite overparameterization. In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels. (ii) to start to overfit to the noisy labels network must stray rather far from from the initialization which can only occur after many more iterations. Together, these results show that gradient descent with early stopping is provably robust to label noise and shed light on the empirical robustness of deep networks as well as commonly adopted heuristics to prevent overfitting. |
Tasks | |
Published | 2019-03-27 |
URL | https://arxiv.org/abs/1903.11680v3 |
https://arxiv.org/pdf/1903.11680v3.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-with-early-stopping-is |
Repo | https://github.com/BSAraujo/machine-learning |
Framework | tf |
Variance reduction for Markov chains with application to MCMC
Title | Variance reduction for Markov chains with application to MCMC |
Authors | D. Belomestny, L. Iosipoi, E. Moulines, A. Naumov, S. Samsonov |
Abstract | In this paper we propose a novel variance reduction approach for additive functionals of Markov chains based on minimization of an estimate for the asymptotic variance of these functionals over suitable classes of control variates. A distinctive feature of the proposed approach is its ability to significantly reduce the overall finite sample variance. This feature is theoretically demonstrated by means of a deep non asymptotic analysis of a variance reduced functional as well as by a thorough simulation study. In particular we apply our method to various MCMC Bayesian estimation problems where it favourably compares to the existing variance reduction approaches. |
Tasks | |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03643v2 |
https://arxiv.org/pdf/1910.03643v2.pdf | |
PWC | https://paperswithcode.com/paper/variance-reduction-for-markov-chains-with |
Repo | https://github.com/svsamsonov/esvm |
Framework | none |
A Matrix-in-matrix Neural Network for Image Super Resolution
Title | A Matrix-in-matrix Neural Network for Image Super Resolution |
Authors | Hailong Ma, Xiangxiang Chu, Bo Zhang, Shaohua Wan, Bo Zhang |
Abstract | In recent years, deep learning methods have achieved impressive results with higher peak signal-to-noise ratio in single image super-resolution (SISR) tasks by utilizing deeper layers. However, their application is quite limited since they require high computing power. In addition, most of the existing methods rarely take full advantage of the intermediate features which are helpful for restoration. To address these issues, we propose a moderate-size SISR net work named matrixed channel attention network (MCAN) by constructing a matrix ensemble of multi-connected channel attention blocks (MCAB). Several models of different sizes are released to meet various practical requirements. Conclusions can be drawn from our extensive benchmark experiments that the proposed models achieve better performance with much fewer multiply-adds and parameters. Our models will be made publicly available. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.07949v1 |
http://arxiv.org/pdf/1903.07949v1.pdf | |
PWC | https://paperswithcode.com/paper/a-matrix-in-matrix-neural-network-for-image |
Repo | https://github.com/macn3388/MCAN |
Framework | pytorch |
GCNv2: Efficient Correspondence Prediction for Real-Time SLAM
Title | GCNv2: Efficient Correspondence Prediction for Real-Time SLAM |
Authors | Jiexiong Tang, Ludvig Ericson, John Folkesson, Patric Jensfelt |
Abstract | In this paper, we present a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM2. GCNv2 significantly improves the computational efficiency over GCN that was only able to run on desktop hardware. We show how a modified version of ORB-SLAM2 using GCNv2 features runs on a Jetson TX2, an embedded low-power platform. Experimental results show that GCNv2 retains comparable accuracy as GCN and that it is robust enough to use for control of a flying drone. |
Tasks | |
Published | 2019-02-28 |
URL | https://arxiv.org/abs/1902.11046v3 |
https://arxiv.org/pdf/1902.11046v3.pdf | |
PWC | https://paperswithcode.com/paper/gcnv2-efficient-correspondence-prediction-for |
Repo | https://github.com/jiexiong2016/GCNv2_SLAM |
Framework | pytorch |
How Can We Be So Dense? The Benefits of Using Highly Sparse Representations
Title | How Can We Be So Dense? The Benefits of Using Highly Sparse Representations |
Authors | Subutai Ahmad, Luiz Scheinkman |
Abstract | Most artificial networks today rely on dense representations, whereas biological networks rely on sparse representations. In this paper we show how sparse representations can be more robust to noise and interference, as long as the underlying dimensionality is sufficiently high. A key intuition that we develop is that the ratio of the operable volume around a sparse vector divided by the volume of the representational space decreases exponentially with dimensionality. We then analyze computationally efficient sparse networks containing both sparse weights and activations. Simulations on MNIST and the Google Speech Command Dataset show that such networks demonstrate significantly improved robustness and stability compared to dense networks, while maintaining competitive accuracy. We discuss the potential benefits of sparsity on accuracy, noise robustness, hyperparameter tuning, learning speed, computational efficiency, and power requirements. |
Tasks | |
Published | 2019-03-27 |
URL | http://arxiv.org/abs/1903.11257v2 |
http://arxiv.org/pdf/1903.11257v2.pdf | |
PWC | https://paperswithcode.com/paper/how-can-we-be-so-dense-the-benefits-of-using |
Repo | https://github.com/marty1885/sparsenet-pytorch |
Framework | pytorch |
Cold Case: The Lost MNIST Digits
Title | Cold Case: The Lost MNIST Digits |
Authors | Chhavi Yadav, Léon Bottou |
Abstract | Although the popular MNIST dataset [LeCun et al., 1994] is derived from the NIST database [Grother and Hanaoka, 1995], the precise processing steps for this derivation have been lost to time. We propose a reconstruction that is accurate enough to serve as a replacement for the MNIST dataset, with insignificant changes in accuracy. We trace each MNIST digit to its NIST source and its rich metadata such as writer identifier, partition identifier, etc. We also reconstruct the complete MNIST test set with 60,000 samples instead of the usual 10,000. Since the balance 50,000 were never distributed, they enable us to investigate the impact of twenty-five years of MNIST experiments on the reported testing performances. Our results unambiguously confirm the trends observed by Recht et al. [2018, 2019]: although the misclassification rates are slightly off, classifier ordering and model selection remain broadly reliable. We attribute this phenomenon to the pairing benefits of comparing classifiers on the same digits. |
Tasks | Image Classification, Model Selection |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10498v2 |
https://arxiv.org/pdf/1905.10498v2.pdf | |
PWC | https://paperswithcode.com/paper/cold-case-the-lost-mnist-digits |
Repo | https://github.com/facebookresearch/qmnist |
Framework | pytorch |
LoGANv2: Conditional Style-Based Logo Generation with Generative Adversarial Networks
Title | LoGANv2: Conditional Style-Based Logo Generation with Generative Adversarial Networks |
Authors | Cedric Oeldorf, Gerasimos Spanakis |
Abstract | Domains such as logo synthesis, in which the data has a high degree of multi-modality, still pose a challenge for generative adversarial networks (GANs). Recent research shows that progressive training (ProGAN) and mapping network extensions (StyleGAN) enable both increased training stability for higher dimensional problems and better feature separation within the embedded latent space. However, these architectures leave limited control over shaping the output of the network, which is an undesirable trait in the case of logo synthesis. This paper explores a conditional extension to the StyleGAN architecture with the aim of firstly, improving on the low resolution results of previous research and, secondly, increasing the controllability of the output through the use of synthetic class-conditions. Furthermore, methods of extracting such class conditions are explored with a focus on the human interpretability, where the challenge lies in the fact that, by nature, visual logo characteristics are hard to define. The introduced conditional style-based generator architecture is trained on the extracted class-conditions in two experiments and studied relative to the performance of an unconditional model. Results show that, whilst the unconditional model more closely matches the training distribution, high quality conditions enabled the embedding of finer details onto the latent space, leading to more diverse output. |
Tasks | |
Published | 2019-09-22 |
URL | https://arxiv.org/abs/1909.09974v1 |
https://arxiv.org/pdf/1909.09974v1.pdf | |
PWC | https://paperswithcode.com/paper/190909974 |
Repo | https://github.com/cedricoeldorf/ConditionalStyleGAN |
Framework | tf |
Image Generation From Small Datasets via Batch Statistics Adaptation
Title | Image Generation From Small Datasets via Batch Statistics Adaptation |
Authors | Atsuhiro Noguchi, Tatsuya Harada |
Abstract | Thanks to the recent development of deep generative models, it is becoming possible to generate high-quality images with both fidelity and diversity. However, the training of such generative models requires a large dataset. To reduce the amount of data required, we propose a new method for transferring prior knowledge of the pre-trained generator, which is trained with a large dataset, to a small dataset in a different domain. Using such prior knowledge, the model can generate images leveraging some common sense that cannot be acquired from a small dataset. In this work, we propose a novel method focusing on the parameters for batch statistics, scale and shift, of the hidden layers in the generator. By training only these parameters in a supervised manner, we achieved stable training of the generator, and our method can generate higher quality images compared to previous methods without collapsing, even when the dataset is small (~100). Our results show that the diversity of the filters acquired in the pre-trained generator is important for the performance on the target domain. Our method makes it possible to add a new class or domain to a pre-trained generator without disturbing the performance on the original domain. |
Tasks | Common Sense Reasoning, Image Generation |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.01774v4 |
https://arxiv.org/pdf/1904.01774v4.pdf | |
PWC | https://paperswithcode.com/paper/image-generation-from-small-datasets-via |
Repo | https://github.com/nogu-atsu/small-dataset-image-generation |
Framework | pytorch |