January 29, 2020

3845 words 19 mins read

Paper Group ANR 745

Paper Group ANR 745

Multi-Kernel Filtering for Nonstationary Noise: An Extension of Bilateral Filtering Using Image Context. Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks. Biomedical Image Segmentation by Retina-like Sequential Attention Mechanism Using Only A Few Training Images. Deep 3D-Zoom Net: Unsupervised Le …

Multi-Kernel Filtering for Nonstationary Noise: An Extension of Bilateral Filtering Using Image Context

Title Multi-Kernel Filtering for Nonstationary Noise: An Extension of Bilateral Filtering Using Image Context
Authors Feihong Liu, Jun Feng, Pew-Thian Yap, Dinggang Shen
Abstract Bilateral filtering (BF) is one of the most classical denoising filters, however, the manually initialized filtering kernel hampers its adaptivity across images with various characteristics. To deal with image variation (i.e., non-stationary noise), in this paper, we propose multi-kernel filter (MKF) which adapts filtering kernels to specific image characteristics automatically. The design of MKF takes inspiration from adaptive mechanisms of human vision that make full use of information in a visual context. More specifically, for simulating the visual context and its adaptive function, we construct the image context based on which we simulate the contextual impact on filtering kernels. We first design a hierarchically clustering algorithm to generate a hierarchy of large to small coherent image patches, organized as a cluster tree, so that obtain multi-scale image representation. The leaf cluster and corresponding predecessor clusters are used to generate one of multiple range kernels that are capable of catering to image variation. At first, we design a hierarchically clustering framework to generate a hierarchy of large to small coherent image patches that organized as a cluster tree, so that obtain multi-scale image representation, i.e., the image context. Next, a leaf cluster is used to generate one of the multiple kernels, and two corresponding predecessor clusters are used to fine-tune the adopted kernel. Ultimately, the single spatially-invariant kernel in BF becomes multiple spatially-varying ones. We evaluate MKF on two public datasets, BSD300 and BrainWeb which are added integrally-varying noise and spatially-varying noise, respectively. Extensive experiments show that MKF outperforms state-of-the-art filters w.r.t. both mean absolute error and structural similarity.
Tasks Denoising
Published 2019-08-17
URL https://arxiv.org/abs/1908.06307v4
PDF https://arxiv.org/pdf/1908.06307v4.pdf
PWC https://paperswithcode.com/paper/multi-kernel-filtering-an-extension-of
Repo
Framework

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks

Title Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
Authors Alexander Shevchenko, Marco Mondelli
Abstract The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network becomes wider. We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left. Our results exhibit a mild dependence on the input dimension: they are dimension-free for two-layer networks and depend linearly on the dimension for multilayer networks. We validate our theoretical findings with numerical experiments for different architectures and classification tasks.
Tasks
Published 2019-12-20
URL https://arxiv.org/abs/1912.10095v1
PDF https://arxiv.org/pdf/1912.10095v1.pdf
PWC https://paperswithcode.com/paper/landscape-connectivity-and-dropout-stability
Repo
Framework

Biomedical Image Segmentation by Retina-like Sequential Attention Mechanism Using Only A Few Training Images

Title Biomedical Image Segmentation by Retina-like Sequential Attention Mechanism Using Only A Few Training Images
Authors Shohei Hayashi, Bisser Raytchev, Toru Tamaki, Kazufumi Kaneda
Abstract In this paper we propose a novel deep learning-based algorithm for biomedical image segmentation which uses a sequential attention mechanism able to shift the focus of attention across the image in a selective way, allowing subareas which are more difficult to classify to be processed at increased resolution. The spatial distribution of class information in each subarea is learned using a retina-like representation where resolution decreases with distance from the center of attention. The final segmentation is achieved by averaging class predictions over overlapping subareas, utilizing the power of ensemble learning to increase segmentation accuracy. Experimental results for semantic segmentation task for which only a few training images are available show that a CNN using the proposed method outperforms both a patch-based classification CNN and a fully convolutional-based method.
Tasks Semantic Segmentation
Published 2019-09-27
URL https://arxiv.org/abs/1909.12612v1
PDF https://arxiv.org/pdf/1909.12612v1.pdf
PWC https://paperswithcode.com/paper/biomedical-image-segmentation-by-retina-like
Repo
Framework

Deep 3D-Zoom Net: Unsupervised Learning of Photo-Realistic 3D-Zoom

Title Deep 3D-Zoom Net: Unsupervised Learning of Photo-Realistic 3D-Zoom
Authors Juan Luis Gonzalez Bello, Munchurl Kim
Abstract The 3D-zoom operation is the positive translation of the camera in the Z-axis, perpendicular to the image plane. In contrast, the optical zoom changes the focal length and the digital zoom is used to enlarge a certain region of an image to the original image size. In this paper, we are the first to formulate an unsupervised 3D-zoom learning problem where images with an arbitrary zoom factor can be generated from a given single image. An unsupervised framework is convenient, as it is a challenging task to obtain a 3D-zoom dataset of natural scenes due to the need for special equipment to ensure camera movement is restricted to the Z-axis. In addition, the objects in the scenes should not move when being captured, which hinders the construction of a large dataset of outdoor scenes. We present a novel unsupervised framework to learn how to generate arbitrarily 3D-zoomed versions of a single image, not requiring a 3D-zoom ground truth, called the Deep 3D-Zoom Net. The Deep 3D-Zoom Net incorporates the following features: (i) transfer learning from a pre-trained disparity estimation network via a back re-projection reconstruction loss; (ii) a fully convolutional network architecture that models depth-image-based rendering (DIBR), taking into account high-frequency details without the need for estimating the intermediate disparity; and (iii) incorporating a discriminator network that acts as a no-reference penalty for unnaturally rendered areas. Even though there is no baseline to fairly compare our results, our method outperforms previous novel view synthesis research in terms of realistic appearance on large camera baselines. We performed extensive experiments to verify the effectiveness of our method on the KITTI and Cityscapes datasets.
Tasks Disparity Estimation, Novel View Synthesis, Transfer Learning
Published 2019-09-20
URL https://arxiv.org/abs/1909.09349v2
PDF https://arxiv.org/pdf/1909.09349v2.pdf
PWC https://paperswithcode.com/paper/deep-3d-zoom-net-unsupervised-learning-of
Repo
Framework

Automated Fitting of Neural Network Potentials at Coupled Cluster Accuracy: Protonated Water Clusters as Testing Ground

Title Automated Fitting of Neural Network Potentials at Coupled Cluster Accuracy: Protonated Water Clusters as Testing Ground
Authors Christoph Schran, Jörg Behler, Dominik Marx
Abstract Highly accurate potential energy surfaces are of key interest for the detailed understanding and predictive modeling of chemical systems. In recent years, several new types of force fields, which are based on machine learning algorithms and fitted to ab initio reference calculations, have been introduced to meet this requirement. Here we show how high-dimensional neural network potentials can be employed to automatically generate the potential energy surface of finite sized clusters at coupled cluster accuracy, namely CCSD(T*)-F12a/aug-cc-pVTZ. The developed automated procedure utilizes the established intrinsic properties of the model such that the configurations for the training set are selected in an unbiased and efficient way to minimize the computational effort of expensive reference calculations. These ideas are applied to protonated water clusters from the hydronium cation, H$_3$O$^+$, up to the tetramer, H$_9$O$_{4}^{+}$, and lead to a single potential energy surface that describes all these systems at essentially converged coupled cluster accuracy with a fitting error of 0.06 kJ/mol per atom. The fit is validated in detail for all clusters up to the tetramer and yields reliable results not only for stationary points, but also for reaction pathways, intermediate configurations, as well as different sampling techniques. Per design the NNPs constructed in this fashion can handle very different conditions including the quantum nature of the nuclei and enhanced sampling techniques covering very low as well as high temperatures. This enables fast and exhaustive exploration of the targeted protonated water clusters with essentially converged interactions. In addition, the automated process will allow one to tackle finite systems much beyond the present case.
Tasks
Published 2019-08-23
URL https://arxiv.org/abs/1908.08734v2
PDF https://arxiv.org/pdf/1908.08734v2.pdf
PWC https://paperswithcode.com/paper/automated-fitting-of-neural-network
Repo
Framework

Sem-LSD: A Learning-based Semantic Line Segment Detector

Title Sem-LSD: A Learning-based Semantic Line Segment Detector
Authors Yi Sun, Xushen Han, Kai Sun, Boren Li, Yongjiang Chen, Mingyang Li
Abstract In this paper, we introduces a new type of line-shaped image representation, named semantic line segment (Sem-LS) and focus on solving its detection problem. Sem-LS contains high-level semantics and is a compact scene representation where only visually salient line segments with stable semantics are preserved. Combined with high-level semantics, Sem-LS is more robust under cluttered environment compared with existing line-shaped representations. The compactness of Sem-LS facilitates its use in large-scale applications, such as city-scale SLAM (simultaneously localization and mapping) and LCD (loop closure detection). Sem-LS detection is a challenging task due to its significantly different appearance from existing learning-based image representations such as wireframes and objects. For further investigation, we first label Sem-LS on two well-known datasets, KITTI and KAIST URBAN, as new benchmarks. Then, we propose a learning-based Sem-LS detector (Sem-LSD) and devise new module as well as metrics to address unique challenges in Sem-LS detection. Experimental results have shown both the efficacy and efficiency of Sem-LSD. Finally, the effectiveness of the proposed Sem-LS is supported by two experiments on detector repeatability and a city-scale LCD problem. Labeled datasets and code will be released shortly.
Tasks Line Segment Detection, Loop Closure Detection
Published 2019-09-14
URL https://arxiv.org/abs/1909.06591v2
PDF https://arxiv.org/pdf/1909.06591v2.pdf
PWC https://paperswithcode.com/paper/line-as-object-datasets-and-framework-for
Repo
Framework

Crowd Density Forecasting by Modeling Patch-based Dynamics

Title Crowd Density Forecasting by Modeling Patch-based Dynamics
Authors Hiroaki Minoura, Ryo Yonetani, Mai Nishimura, Yoshitaka Ushiku
Abstract Forecasting human activities observed in videos is a long-standing challenge in computer vision, which leads to various real-world applications such as mobile robots, autonomous driving, and assistive systems. In this work, we present a new visual forecasting task called crowd density forecasting. Given a video of a crowd captured by a surveillance camera, our goal is to predict how that crowd will move in future frames. To address this task, we have developed the patch-based density forecasting network (PDFN), which enables forecasting over a sequence of crowd density maps describing how crowded each location is in each video frame. PDFN represents a crowd density map based on spatially overlapping patches and learns density dynamics patch-wise in a compact latent space. This enables us to model diverse and complex crowd density dynamics efficiently, even when the input video involves a variable number of crowds that each move independently. Experimental results with several public datasets demonstrate the effectiveness of our approach compared with state-of-the-art forecasting methods.
Tasks Autonomous Driving
Published 2019-11-22
URL https://arxiv.org/abs/1911.09814v1
PDF https://arxiv.org/pdf/1911.09814v1.pdf
PWC https://paperswithcode.com/paper/crowd-density-forecasting-by-modeling-patch
Repo
Framework

Stratified Labeling for Surface Consistent Parallax Correction and Occlusion Completion

Title Stratified Labeling for Surface Consistent Parallax Correction and Occlusion Completion
Authors Jie Chen, Lap-Pui Chau, Junhui Hou
Abstract The light field faithfully records the spatial and angular configurations of the scene, which facilitates a wide range of imaging possibilities. In this work, we propose an LF synthesis algorithm which renders high quality novel LF views far outside the range of angular baselines of the given references. A stratified synthesis strategy is adopted which parses the scene content based on stratified disparity layers and across a varying range of spatial granularities. Such a stratified methodology proves to help preserve scene structures over large perspective shifts, and it provides informative clues for inferring the textures of occluded regions. A generative-adversarial network model is further adopted for parallax correction and occlusion completion conditioned on the stratified synthesis features. Experiments show that our proposed model can provide more reliable novel view synthesis quality at large baseline extension ratios. Over 3dB quality improvement has been achieved against state-of-the-art LF view synthesis algorithms.
Tasks Novel View Synthesis
Published 2019-03-07
URL http://arxiv.org/abs/1903.02688v2
PDF http://arxiv.org/pdf/1903.02688v2.pdf
PWC https://paperswithcode.com/paper/stratified-labeling-for-surface-consistent
Repo
Framework

Sim-to-Real Learning for Casualty Detection from Ground Projected Point Cloud Data

Title Sim-to-Real Learning for Casualty Detection from Ground Projected Point Cloud Data
Authors Roni Permana Saputra, Nemanja Rakicevic, Petar Kormushev
Abstract This paper addresses the problem of human body detection—particularly a human body lying on the ground (a.k.a. casualty)—using point cloud data. This ability to detect a casualty is one of the most important features of mobile rescue robots, in order for them to be able to operate autonomously. We propose a deep-learning-based casualty detection method using a deep convolutional neural network (CNN). This network is trained to be able to detect a casualty using a point-cloud data input. In the method we propose, the point cloud input is pre-processed to generate a depth image-like ground-projected heightmap. This heightmap is generated based on the projected distance of each point onto the detected ground plane within the point cloud data. The generated heightmap – in image form – is then used as an input for the CNN to detect a human body lying on the ground. To train the neural network, we propose a novel sim-to-real approach, in which the network model is trained using synthetic data obtained in simulation and then tested on real sensor data. To make the model transferable to real data implementations, during the training we adopt specific data augmentation strategies with the synthetic training data. The experimental results show that data augmentation introduced during the training process is essential for improving the performance of the trained model on real data. More specifically, the results demonstrate that the data augmentations on raw point-cloud data have contributed to a considerable improvement of the trained model performance.
Tasks Data Augmentation
Published 2019-08-08
URL https://arxiv.org/abs/1908.03057v2
PDF https://arxiv.org/pdf/1908.03057v2.pdf
PWC https://paperswithcode.com/paper/sim-to-real-learning-for-casualty-detection
Repo
Framework

Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks

Title Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks
Authors Rémi Bernhard, Pierre-Alain Moellic, Jean-Max Dutertre
Abstract As the will to deploy neural networks models on embedded systems grows, and considering the related memory footprint and energy consumption issues, finding lighter solutions to store neural networks such as weight quantization and more efficient inference methods become major research topics. Parallel to that, adversarial machine learning has risen recently with an impressive and significant attention, unveiling some critical flaws of machine learning models, especially neural networks. In particular, perturbed inputs called adversarial examples have been shown to fool a model into making incorrect predictions. In this article, we investigate the adversarial robustness of quantized neural networks under different threat models for a classical supervised image classification task. We show that quantization does not offer any robust protection, results in severe form of gradient masking and advance some hypotheses to explain it. However, we experimentally observe poor transferability capacities which we explain by quantization value shift phenomenon and gradient misalignment and explore how these results can be exploited with an ensemble-based defense.
Tasks Image Classification, Quantization
Published 2019-09-27
URL https://arxiv.org/abs/1909.12741v1
PDF https://arxiv.org/pdf/1909.12741v1.pdf
PWC https://paperswithcode.com/paper/impact-of-low-bitwidth-quantization-on-the
Repo
Framework

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Title Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles
Authors Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen
Abstract This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based Deterministic Policy Gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. And the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.
Tasks Policy Gradient Methods, Q-Learning
Published 2019-09-07
URL https://arxiv.org/abs/1909.03204v1
PDF https://arxiv.org/pdf/1909.03204v1.pdf
PWC https://paperswithcode.com/paper/multi-pseudo-q-learning-based-deterministic
Repo
Framework

Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

Title Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling
Authors Jeremy Gordon, David Rawlinson, Subutai Ahmad
Abstract In sequence learning tasks such as language modelling, Recurrent Neural Networks must learn relationships between input features separated by time. State of the art models such as LSTM and Transformer are trained by backpropagation of losses into prior hidden states and inputs held in memory. This allows gradients to flow from present to past and effectively learn with perfect hindsight, but at a significant memory cost. In this paper we show that it is possible to train high performance recurrent networks using information that is local in time, and thereby achieve a significantly reduced memory footprint. We describe a predictive autoencoder called bRSM featuring recurrent connections, sparse activations, and a boosting rule for improved cell utilization. The architecture demonstrates near optimal performance on a non-deterministic (stochastic) partially-observable sequence learning task consisting of high-Markov-order sequences of MNIST digits. We find that this model learns these sequences faster and more completely than an LSTM, and offer several possible explanations why the LSTM architecture might struggle with the partially observable sequence structure in this task. We also apply our model to a next word prediction task on the Penn Treebank (PTB) dataset. We show that a ‘flattened’ RSM network, when paired with a modern semantic word embedding and the addition of boosting, achieves 103.5 PPL (a 20-point improvement over the best N-gram models), beating ordinary RNNs trained with BPTT and approaching the scores of early LSTM implementations. This work provides encouraging evidence that strong results on challenging tasks such as language modelling may be possible using less memory intensive, biologically-plausible training regimes.
Tasks Language Modelling
Published 2019-12-02
URL https://arxiv.org/abs/1912.01116v1
PDF https://arxiv.org/pdf/1912.01116v1.pdf
PWC https://paperswithcode.com/paper/long-distance-relationships-without-time
Repo
Framework

AuthorGAN: Improving GAN Reproducibility using a Modular GAN Framework

Title AuthorGAN: Improving GAN Reproducibility using a Modular GAN Framework
Authors Raunak Sinha, Anush Sankaran, Mayank Vatsa, Richa Singh
Abstract Generative models are becoming increasingly popular in the literature, with Generative Adversarial Networks (GAN) being the most successful variant, yet. With this increasing demand and popularity, it is becoming equally difficult and challenging to implement and consume GAN models. A qualitative user survey conducted across 47 practitioners show that expert level skill is required to use GAN model for a given task, despite the presence of various open source libraries. In this research, we propose a novel system called AuthorGAN, aiming to achieve true democratization of GAN authoring. A highly modularized library agnostic representation of GAN model is defined to enable interoperability of GAN architecture across different libraries such as Keras, Tensorflow, and PyTorch. An intuitive drag-and-drop based visual designer is built using node-red platform to enable custom architecture designing without the need for writing any code. Five different GAN models are implemented as a part of this framework and the performance of the different GAN models are shown using the benchmark MNIST dataset.
Tasks
Published 2019-11-26
URL https://arxiv.org/abs/1911.13250v1
PDF https://arxiv.org/pdf/1911.13250v1.pdf
PWC https://paperswithcode.com/paper/authorgan-improving-gan-reproducibility-using
Repo
Framework

Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder

Title Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder
Authors Ji Feng, Qi-Zhi Cai, Zhi-Hua Zhou
Abstract In this work, we consider one challenging training time attack by modifying training data with bounded perturbation, hoping to manipulate the behavior (both targeted or non-targeted) of any corresponding trained classifier during test time when facing clean samples. To achieve this, we proposed to use an auto-encoder-like network to generate the pertubation on the training data paired with one differentiable system acting as the imaginary victim classifier. The perturbation generator will learn to update its weights by watching the training procedure of the imaginary classifier in order to produce the most harmful and imperceivable noise which in turn will lead the lowest generalization power for the victim classifier. This can be formulated into a non-linear equality constrained optimization problem. Unlike GANs, solving such problem is computationally challenging, we then proposed a simple yet effective procedure to decouple the alternating updates for the two networks for stability. The method proposed in this paper can be easily extended to the label specific setting where the attacker can manipulate the predictions of the victim classifiers according to some predefined rules rather than only making wrong predictions. Experiments on various datasets including CIFAR-10 and a reduced version of ImageNet confirmed the effectiveness of the proposed method and empirical results showed that, such bounded perturbation have good transferability regardless of which classifier the victim is actually using on image data.
Tasks
Published 2019-05-22
URL https://arxiv.org/abs/1905.09027v1
PDF https://arxiv.org/pdf/1905.09027v1.pdf
PWC https://paperswithcode.com/paper/learning-to-confuse-generating-training-time
Repo
Framework

Partitioned integrators for thermodynamic parameterization of neural networks

Title Partitioned integrators for thermodynamic parameterization of neural networks
Authors Benedict Leimkuhler, Charles Matthews, Tiffany Vlaar
Abstract Traditionally, neural networks are parameterized using optimization procedures such as stochastic gradient descent, RMSProp and ADAM. These procedures tend to drive the parameters of the network toward a local minimum. In this article, we employ alternative “sampling” algorithms (referred to here as “thermodynamic parameterization methods”) which rely on discretized stochastic differential equations for a defined target distribution on parameter space. We show that the thermodynamic perspective already improves neural network training. Moreover, by partitioning the parameters based on natural layer structure we obtain schemes with very rapid convergence for data sets with complicated loss landscapes. We describe easy-to-implement hybrid partitioned numerical algorithms, based on discretized stochastic differential equations, which are adapted to feed-forward neural networks, including a multi-layer Langevin algorithm, AdLaLa (combining the adaptive Langevin and Langevin algorithms) and LOL (combining Langevin and Overdamped Langevin); we examine the convergence of these methods using numerical studies and compare their performance among themselves and in relation to standard alternatives such as stochastic gradient descent and ADAM. We present evidence that thermodynamic parameterization methods can be (i) faster, (ii) more accurate, and (iii) more robust than standard algorithms used within machine learning frameworks.
Tasks
Published 2019-08-30
URL https://arxiv.org/abs/1908.11843v2
PDF https://arxiv.org/pdf/1908.11843v2.pdf
PWC https://paperswithcode.com/paper/partitioned-integrators-for-thermodynamic
Repo
Framework
comments powered by Disqus