Paper Group ANR 487
Fast and Optimal Laplacian Solver for Gradient-Domain Image Editing using Green Function Convolution. Lifelong learning for text retrieval and recognition in historical handwritten document collections. Semi-supervised Stacked Label Consistent Autoencoder for Reconstruction and Analysis of Biomedical Signals. IC-Network: An Inter-layer Collision Ne …
Fast and Optimal Laplacian Solver for Gradient-Domain Image Editing using Green Function Convolution
Title | Fast and Optimal Laplacian Solver for Gradient-Domain Image Editing using Green Function Convolution |
Authors | Dominique Beaini, Sofiane Achiche, Fabrice Nonez, Olivier Brochu Dufour, Cédric Leblond-Ménard, Mahdis Asaadi, Maxime Raison |
Abstract | In computer vision, the gradient and Laplacian of an image are used in different applications, such as edge detection, feature extraction, and seamless image cloning. Computing the gradient of an image is straightforward since numerical derivatives are available in most computer vision toolboxes. However, the reverse problem is more difficult, since computing an image from its gradient requires to solve the Laplacian equation, also called Poisson equation. Current discrete methods are either slow or require heavy parallel computing. The objective of this paper is to present a novel fast and robust method of solving the image gradient or Laplacian with minimal error, which can be used for gradient domain editing. By using a single convolution based on a numerical Green’s function, the whole process is faster and straightforward to implement with different computer vision libraries. It can also be optimized on a GPU using fast Fourier transforms and can easily be generalized for an n dimension image. The tests show that, for images of resolution 801x1200, the proposed GFC can solve 100 Laplacian in parallel in around 1.0 milliseconds ms. This is orders of magnitude faster than our nearest competitor which requires 294ms for a single image. Furthermore, we prove mathematically and demonstrate empirically that the proposed method is the least error solver for gradient domain editing. The developed method is also validated with examples of Poisson blending, gradient removal, and the proposed gradient domain merging GDM. Finally, we present how the GDM can be leveraged in future works for convolutional neural networks CNN. |
Tasks | Edge Detection |
Published | 2019-02-01 |
URL | https://arxiv.org/abs/1902.00176v2 |
https://arxiv.org/pdf/1902.00176v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-optimal-laplacian-solver-for |
Repo | |
Framework | |
Lifelong learning for text retrieval and recognition in historical handwritten document collections
Title | Lifelong learning for text retrieval and recognition in historical handwritten document collections |
Authors | Lambert Schomaker |
Abstract | This chapter provides an overview of the problems that need to be dealt with when constructing a lifelong-learning retrieval, recognition and indexing engine for large historical document collections in multiple scripts and languages, the Monk system. This application is highly variable over time, since the continuous labeling by end users changes the concept of what a ‘ground truth’ constitutes. Although current advances in deep learning provide a huge potential in this application domain, the scale of the problem, i.e., more than 520 hugely diverse books, documents and manuscripts precludes the current meticulous and painstaking human effort which is required in designing and developing successful deep-learning systems. The ball-park principle is introduced, which describes the evolution from the sparsely-labeled stage that can only be addressed by traditional methods or nearest-neighbor methods on embedded vectors of pre-trained neural networks, up to the other end of the spectrum where massive labeling allows reliable training of deep-learning methods. Contents: Introduction, Expectation management, Deep learning, The ball-park principle, Technical realization, Work flow, Quality and quantity of material, Industrialization and scalability, Human effort, Algorithms, Object of recognition, Processing pipeline, Performance,Compositionality, Conclusion. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05156v1 |
https://arxiv.org/pdf/1912.05156v1.pdf | |
PWC | https://paperswithcode.com/paper/lifelong-learning-for-text-retrieval-and |
Repo | |
Framework | |
Semi-supervised Stacked Label Consistent Autoencoder for Reconstruction and Analysis of Biomedical Signals
Title | Semi-supervised Stacked Label Consistent Autoencoder for Reconstruction and Analysis of Biomedical Signals |
Authors | Anupriya Gogna, Angshul Majumdar, Rabab Ward |
Abstract | In this work we propose an autoencoder based framework for simultaneous reconstruction and classification of biomedical signals. Previously these two tasks, reconstruction and classification were treated as separate problems. This is the first work to propose a combined framework to address the issue in a holistic fashion. Reconstruction techniques for biomedical signals for tele-monitoring are largely based on compressed sensing (CS) based method, these are designed techniques where the reconstruction formulation is based on some assumption regarding the signal. In this work, we propose a new paradigm for reconstruction we learn to reconstruct. An autoencoder can be trained for the same. But since the final goal is to analyze classify the signal we learn a linear classification map inside the autoencoder. The ensuing optimization problem is solved using the Split Bregman technique. Experiments have been carried out on reconstruction and classification of ECG arrhythmia classification and EEG seizure classification signals. Our proposed tool is capable of operating in a semi-supervised fashion. We show that our proposed method is better and more than an order magnitude faster in reconstruction than CS based methods; it is capable of real-time operation. Our method is also better than recently proposed classification methods. Significance: This is the first work offering an alternative to CS based reconstruction. It also shows that representation learning can yield better results than hand-crafted features for signal analysis. |
Tasks | EEG, Representation Learning |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.12127v1 |
https://arxiv.org/pdf/1912.12127v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-stacked-label-consistent |
Repo | |
Framework | |
IC-Network: An Inter-layer Collision Network For Image Classification
Title | IC-Network: An Inter-layer Collision Network For Image Classification |
Authors | Junyi An, Fengshan Liu, Jian Zhao, Furao Shen |
Abstract | Neural networks have been widely used, and most networks achieve excellent performance by stacking certain types of basic units. Compared to increasing the depth and width of the network, designing more effective basic units has become an important research topic. Inspired by the elastic collision model in physics, we present a universal structure that could be integrated into the existing network structures to speed up the training process and increase their generalization abilities. We term this structure the “Inter-layer Collision” (IC) structure. We built two kinds of basic computational units (IC layer and IC block) that compose the convolutional neural networks (CNNs) by combining the IC structure with the convolution operation. Compared to traditional convolutions, both of the proposed computational units have a stronger non-linear representation ability and can filter features useful for a given task. Using these computational units to build networks, we bring significant improvements in performance for existing state-of-the-art CNNs. On the imagenet experiment, we integrate the IC block into ResNet-50 and reduce the top-1 error from $22.85%$ to 21.49%, which also exceeds the top-1 error of ResNet-100 (21.75%). |
Tasks | Image Classification |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08252v2 |
https://arxiv.org/pdf/1911.08252v2.pdf | |
PWC | https://paperswithcode.com/paper/inter-layer-collision-networks |
Repo | |
Framework | |
Classification of Brainwave Signals Based on Hybrid Deep Learning and an Evolutionary Algorithm
Title | Classification of Brainwave Signals Based on Hybrid Deep Learning and an Evolutionary Algorithm |
Authors | Zhyar Rzgar K. Rostam, Sozan Abdullah Mahmood |
Abstract | Brainwave signals are read through Electroencephalogram (EEG) devices. These signals are generated from an active brain based on brain activities and thoughts. The classification of brainwave signals is a challenging task due to its non-stationary nature. To address the issue, this paper proposes a Convolutional Neural Network (CNN) model to classify brainwave signals. In order to evaluate the performance of the proposed model a dataset is developed by recording brainwave signals for two conditions, which are visible and invisible. In the visible mode, the human subjects focus on the color and shape presented. Meanwhile, in the invisible mode, the subjects think about specific colors or shapes with closed eyes. A comparison has been provided between the original CNN and the proposed CNN architecture on the same dataset. The results show that the proposed CNN model achieves higher classification accuracy as compared to the standard CNN. The best accuracy rate achieved when the proposed CNN is applied on the visible color mode is 92%. In the future, improvements on the proposed CNN will be able to classify raw EEG signals in an efficient way. |
Tasks | EEG |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.07361v1 |
https://arxiv.org/pdf/1912.07361v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-brainwave-signals-based-on |
Repo | |
Framework | |
A Model for Spatial Outlier Detection Based on Weighted Neighborhood Relationship
Title | A Model for Spatial Outlier Detection Based on Weighted Neighborhood Relationship |
Authors | Ayman Taha, Hoda M. Onsi, Mohammed Nour El din, Osman M. Hegazy |
Abstract | Spatial outliers are used to discover inconsistent objects producing implicit, hidden, and interesting knowledge, which has an effective role in decision-making process. In this paper, we propose a model to redefine the spatial neighborhood relationship by considering weights of the most effective parameters of neighboring objects in a given spatial data set. The spatial parameters, which are taken into our consideration, are distance, cost, and number of direct connections between neighboring objects. This model is adaptable to be applied on polygonal objects. The proposed model is applied to a GIS system supporting literacy project in Fayoum governorate. |
Tasks | Decision Making, Outlier Detection |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01867v1 |
https://arxiv.org/pdf/1911.01867v1.pdf | |
PWC | https://paperswithcode.com/paper/a-model-for-spatial-outlier-detection-based |
Repo | |
Framework | |
Visualizing Image Content to Explain Novel Image Discovery
Title | Visualizing Image Content to Explain Novel Image Discovery |
Authors | Jake H. Lee, Kiri L. Wagstaff |
Abstract | The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in interpreting and diagnosing the novel aspect of these selected observations, we recommend the use of novelty detection methods that generate explanations. In the context of large image data sets, these explanations should highlight what aspect of a given image is new (color, shape, texture, content) in a human-comprehensible form. We propose DEMUD-VIS, the first method for providing visual explanations of novel image content by employing a convolutional neural network (CNN) to extract image features, a method that uses reconstruction error to detect novel content, and an up-convolutional network to convert CNN feature representations back into image space. We demonstrate this approach on diverse images from ImageNet, freshwater streams, and the surface of Mars. |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05006v1 |
https://arxiv.org/pdf/1908.05006v1.pdf | |
PWC | https://paperswithcode.com/paper/visualizing-image-content-to-explain-novel |
Repo | |
Framework | |
CG-GAN: An Interactive Evolutionary GAN-based Approach for Facial Composite Generation)
Title | CG-GAN: An Interactive Evolutionary GAN-based Approach for Facial Composite Generation) |
Authors | Nicola Zaltron, Luisa Zurlo, Sebastian Risi |
Abstract | Facial composites are graphical representations of an eyewitness’s memory of a face. Many digital systems are available for the creation of such composites but are either unable to reproduce features unless previously designed or do not allow holistic changes to the image. In this paper, we improve the efficiency of composite creation by removing the reliance on expert knowledge and letting the system learn to represent faces from examples. The novel approach, Composite Generating GAN (CG-GAN), applies generative and evolutionary computation to allow casual users to easily create facial composites. Specifically, CG-GAN utilizes the generator network of a pg-GAN to create high-resolution human faces. Users are provided with several functions to interactively breed and edit faces. CG-GAN offers a novel way of generating and handling static and animated photo-realistic facial composites, with the possibility of combining multiple representations of the same perpetrator, generated by different eyewitnesses. |
Tasks | |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1912.05020v1 |
https://arxiv.org/pdf/1912.05020v1.pdf | |
PWC | https://paperswithcode.com/paper/cg-gan-an-interactive-evolutionary-gan-based |
Repo | |
Framework | |
4D CNN for semantic segmentation of cardiac volumetric sequences
Title | 4D CNN for semantic segmentation of cardiac volumetric sequences |
Authors | Andriy Myronenko, Dong Yang, Varun Buch, Daguang Xu, Alvin Ihsani, Sean Doyle, Mark Michalski, Neil Tenenholtz, Holger Roth |
Abstract | We propose a 4D convolutional neural network (CNN) for the segmentation of retrospective ECG-gated cardiac CT, a series of single-channel volumetric data over time. While only a small subset of volumes in the temporal sequence is annotated, we define a sparse loss function on available labels to allow the network to leverage unlabeled images during training and generate a fully segmented sequence. We investigate the accuracy of the proposed 4D network to predict temporally consistent segmentations and compare with traditional 3D segmentation approaches. We demonstrate the feasibility of the 4D CNN and establish its performance on cardiac 4D CCTA. |
Tasks | Semantic Segmentation |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07295v2 |
https://arxiv.org/pdf/1906.07295v2.pdf | |
PWC | https://paperswithcode.com/paper/4d-cnn-for-semantic-segmentation-of-cardiac |
Repo | |
Framework | |
Predicting Actions to Help Predict Translations
Title | Predicting Actions to Help Predict Translations |
Authors | Zixiu Wu, Julia Ive, Josiah Wang, Pranava Madhyastha, Lucia Specia |
Abstract | We address the task of text translation on the How2 dataset using a state of the art transformer-based multimodal approach. The question we ask ourselves is whether visual features can support the translation process, in particular, given that this is a dataset extracted from videos, we focus on the translation of actions, which we believe are poorly captured in current static image-text datasets currently used for multimodal translation. For that purpose, we extract different types of action features from the videos and carefully investigate how helpful this visual information is by testing whether it can increase translation quality when used in conjunction with (i) the original text and (ii) the original text where action-related words (or all verbs) are masked out. The latter is a simulation that helps us assess the utility of the image in cases where the text does not provide enough context about the action, or in the presence of noise in the input text. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01665v2 |
https://arxiv.org/pdf/1908.01665v2.pdf | |
PWC | https://paperswithcode.com/paper/predicting-actions-to-help-predict |
Repo | |
Framework | |
ORBSLAM-Atlas: a robust and accurate multi-map system
Title | ORBSLAM-Atlas: a robust and accurate multi-map system |
Authors | Richard Elvira, Juan D. Tardós, J. M. M. Montiel |
Abstract | We propose ORBSLAM-Atlas, a system able to handle an unlimited number of disconnected sub-maps, that includes a robust map merging algorithm able to detect sub-maps with common regions and seamlessly fuse them. The outstanding robustness and accuracy of ORBSLAM are due to its ability to detect wide-baseline matches between keyframes, and to exploit them by means of non-linear optimization, however it only can handle a single map. ORBSLAM-Atlas brings the wide-baseline matching detection and exploitation to the multiple map arena. The result is a SLAM system significantly more general and robust, able to perform multi-session mapping. If tracking is lost during exploration, instead of freezing the map, a new sub-map is launched, and it can be fused with the previous map when common parts are visited. Our criteria to declare the camera lost contrast with previous approaches that simply count the number of tracked points, we propose to discard also inaccurately estimated camera poses due to bad geometrical conditioning. As a result, the map is split into more accurate sub-maps, that are eventually merged in a more accurate global map, thanks to the multi-mapping capabilities. We provide extensive experimental validation in the EuRoC datasets, where ORBSLAM-Atlas obtains accurate monocular and stereo results in the difficult sequences where ORBSLAM failed. We also build global maps after multiple sessions in the same room, obtaining the best results to date, between 2 and 3 times more accurate than competing multi-map approaches. We also show the robustness and capability of our system to deal with dynamic scenes, quantitatively in the EuRoC datasets and qualitatively in a densely populated corridor where camera occlusions and tracking losses are frequent. |
Tasks | |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11585v1 |
https://arxiv.org/pdf/1908.11585v1.pdf | |
PWC | https://paperswithcode.com/paper/orbslam-atlas-a-robust-and-accurate-multi-map |
Repo | |
Framework | |
Unsupervised Adversarial Correction of Rigid MR Motion Artifacts
Title | Unsupervised Adversarial Correction of Rigid MR Motion Artifacts |
Authors | Karim Armanious, Aastha Tanwar, Sherif Abdulatif, Thomas Küstner, Sergios Gatidis, Bin Yang |
Abstract | Motion is one of the main sources for artifacts in magnetic resonance (MR) images. It can have significant consequences on the diagnostic quality of the resultant scans. Previously, supervised adversarial approaches have been suggested for the correction of MR motion artifacts. However, these approaches suffer from the limitation of required paired co-registered datasets for training which are often hard or impossible to acquire. Building upon our previous work, we introduce a new adversarial framework with a new generator architecture and loss function for the unsupervised correction of severe rigid motion artifacts in the brain region. Quantitative and qualitative comparisons with other supervised and unsupervised translation approaches showcase the enhanced performance of the introduced framework. |
Tasks | |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05597v1 |
https://arxiv.org/pdf/1910.05597v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-adversarial-correction-of-rigid |
Repo | |
Framework | |
A Comparison and Strategy of Semantic Segmentation on Remote Sensing Images
Title | A Comparison and Strategy of Semantic Segmentation on Remote Sensing Images |
Authors | Junxing Hu, Ling Li, Yijun Lin, Fengge Wu, Junsuo Zhao |
Abstract | In recent years, with the development of aerospace technology, we use more and more images captured by satellites to obtain information. But a large number of useless raw images, limited data storage resource and poor transmission capability on satellites hinder our use of valuable images. Therefore, it is necessary to deploy an on-orbit semantic segmentation model to filter out useless images before data transmission. In this paper, we present a detailed comparison on the recent deep learning models. Considering the computing environment of satellites, we compare methods from accuracy, parameters and resource consumption on the same public dataset. And we also analyze the relation between them. Based on experimental results, we further propose a viable on-orbit semantic segmentation strategy. It will be deployed on the TianZhi-2 satellite which supports deep learning methods and will be lunched soon. |
Tasks | Semantic Segmentation |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10231v1 |
https://arxiv.org/pdf/1905.10231v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparison-and-strategy-of-semantic |
Repo | |
Framework | |
Detecting multiple change-points in the time-varying Ising model
Title | Detecting multiple change-points in the time-varying Ising model |
Authors | Batiste Le Bars, Pierre Humbert, Argyris Kalogeratos, Nicolas Vayatis |
Abstract | This work focuses on the estimation of change-points in a time-varying Ising graphical model (outputs $-1$ or $1$) evolving in a piecewise constant fashion. The occurring changes alter the graphical structure of the model, a structure which we also estimate in the present paper. For this purpose, we propose a new optimization program consisting in the minimization of a penalized negative conditional log-likelihood. The objective of the penalization is twofold: it imposes the learned graphs to be sparse and, thanks to a fused-type penalty, it enforces them to evolve piecewise constantly. Using few assumptions, we then give a change-point consistency theorem. Up to our knowledge, we are the first to present such theoretical result in the context of time-varying Ising model. Finally, experimental results on several synthetic examples and a real-world dataset demonstrate the empirical performance of our method. |
Tasks | |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08512v1 |
https://arxiv.org/pdf/1910.08512v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-multiple-change-points-in-the-time |
Repo | |
Framework | |
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Title | Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition |
Authors | Pingchuan Ma, Stavros Petridis, Maja Pantic |
Abstract | Several audio-visual speech recognition models have been recently proposed which aim to improve the robustness over audio-only models in the presence of noise. However, almost all of them ignore the impact of the Lombard effect, i.e., the change in speaking style in noisy environments which aims to make speech more intelligible and affects both the acoustic characteristics of speech and the lip movements. In this paper, we investigate the impact of the Lombard effect in audio-visual speech recognition. To the best of our knowledge, this is the first work which does so using end-to-end deep architectures and presents results on unseen speakers. Our results show that properly modelling Lombard speech is always beneficial. Even if a relatively small amount of Lombard speech is added to the training set then the performance in a real scenario, where noisy Lombard speech is present, can be significantly improved. We also show that the standard approach followed in the literature, where a model is trained and tested on noisy plain speech, provides a correct estimate of the video-only performance and slightly underestimates the audio-visual performance. In case of audio-only approaches, performance is overestimated for SNRs higher than -3dB and underestimated for lower SNRs. |
Tasks | Audio-Visual Speech Recognition, Speech Recognition, Visual Speech Recognition |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02112v4 |
https://arxiv.org/pdf/1906.02112v4.pdf | |
PWC | https://paperswithcode.com/paper/investigating-the-lombard-effect-influence-on |
Repo | |
Framework | |