February 2, 2020

Paper Group AWR 20

EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. From Patch to Image Segmentation using Fully Convolutional Networks – Application to Retinal Images. Learning semantic sentence representations from visually grounded language without lexical knowledge. EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse. Harmonizat …

EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning

Title EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
Authors Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Z. Qureshi, Mehran Ebrahimi
Abstract Over the last few years, deep learning techniques have yielded significant improvements in image inpainting. However, many of these techniques fail to reconstruct reasonable structures as they are commonly over-smoothed and/or blurry. This paper develops a new approach for image inpainting that does a better job of reproducing filled regions exhibiting fine details. We propose a two-stage adversarial model EdgeConnect that comprises of an edge generator followed by an image completion network. The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively. Code and models available at: https://github.com/knazeri/edge-connect
Published 2019-01-01
URL http://arxiv.org/abs/1901.00212v3
PDF http://arxiv.org/pdf/1901.00212v3.pdf
PWC https://paperswithcode.com/paper/edgeconnect-generative-image-inpainting-with
Repo https://github.com/icepoint666/edge-connect-ui
Framework pytorch

From Patch to Image Segmentation using Fully Convolutional Networks – Application to Retinal Images

Title From Patch to Image Segmentation using Fully Convolutional Networks – Application to Retinal Images
Authors Taibou Birgui Sekou, Moncef Hidane, Julien Olivier, Hubert Cardot
Abstract Deep learning based models, generally, require a large number of samples for appropriate training, a requirement that is difficult to satisfy in the medical field. This issue can usually be avoided with a proper initialization of the weights. On the task of medical image segmentation in general, two techniques are oftentimes employed to tackle the training of a deep network $f_T$. The first one consists in reusing some weights of a network $f_S$ pre-trained on a large scale database ($e.g.$ ImageNet). This procedure, also known as $transfer$ $learning$, happens to reduce the flexibility when it comes to new network design since $f_T$ is constrained to match some parts of $f_S$. The second commonly used technique consists in working on image patches to benefit from the large number of available patches. This paper brings together these two techniques and propose to train $arbitrarily$ $designed$ $networks$ that segment an image in one forward pass, with a focus on relatively small databases. An experimental work have been carried out on the tasks of retinal blood vessel segmentation and the optic disc one, using four publicly available databases. Furthermore, three types of network are considered, going from a very light weighted network to a densely connected one. The final results show the efficiency of the proposed framework along with state of the art results on all the databases.
Tasks Medical Image Segmentation, Semantic Segmentation, Transfer Learning
Published 2019-04-08
URL https://arxiv.org/abs/1904.03892v2
PDF https://arxiv.org/pdf/1904.03892v2.pdf
PWC https://paperswithcode.com/paper/from-patch-to-image-segmentation-using-fully
Repo https://github.com/Taib/patch2image
Framework tf

Learning semantic sentence representations from visually grounded language without lexical knowledge

Title Learning semantic sentence representations from visually grounded language without lexical knowledge
Authors Danny Merkx, Stefan Frank
Abstract Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics.
Tasks Learning Semantic Representations, Semantic Similarity, Semantic Textual Similarity, Sentence Embeddings, Word Embeddings
Published 2019-03-27
URL http://arxiv.org/abs/1903.11393v1
PDF http://arxiv.org/pdf/1903.11393v1.pdf
PWC https://paperswithcode.com/paper/learning-semantic-sentence-representations
Repo https://github.com/DannyMerkx/caption2image
Framework pytorch

EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse

Title EXTD: Extremely Tiny Face Detector via Iterative Filter Reuse
Authors YoungJoon Yoo, Dongyoon Han, Sangdoo Yun
Abstract In this paper, we propose a new multi-scale face detector having an extremely tiny number of parameters (EXTD),less than 0.1 million, as well as achieving comparable performance to deep heavy detectors. While existing multi-scale face detectors extract feature maps with different scales from a single backbone network, our method generates the feature maps by iteratively reusing a shared lightweight and shallow backbone network. This iterative sharing of the backbone network significantly reduces the number of parameters, and also provides the abstract image semantics captured from the higher stage of the network layers to the lower-level feature map. The proposed idea is employed by various model architectures and evaluated by extensive experiments. From the experiments from WIDER FACE dataset, we show that the proposed face detector can handle faces with various scale and conditions, and achieved comparable performance to the more massive face detectors that few hundreds and tens times heavier in model size and floating point operations.
Published 2019-06-15
URL https://arxiv.org/abs/1906.06579v2
PDF https://arxiv.org/pdf/1906.06579v2.pdf
PWC https://paperswithcode.com/paper/extd-extremely-tiny-face-detector-via
Repo https://github.com/SeungyounShin/EXTD
Framework pytorch

Harmonization of diffusion MRI datasets with adaptive dictionary learning

Title Harmonization of diffusion MRI datasets with adaptive dictionary learning
Authors Samuel St-Jean, Max A. Viergever, Alexander Leemans
Abstract Diffusion magnetic resonance imaging is a noninvasive imaging technique that can indirectly infer the microstructure of tissues and provide metrics which are subject to normal variability across subjects. Potentially abnormal values or features may yield essential information to support analysis of controls and patients cohorts, but subtle confounds affecting diffusion MRI, such as those due to difference in scanning protocols or hardware, can lead to systematic errors which could be mistaken for purely biologically driven variations amongst subjects. In this work, we propose a new harmonization algorithm based on adaptive dictionary learning to mitigate the unwanted variability caused by different scanner hardware while preserving the natural biological variability present in the data. Overcomplete dictionaries, which are learned automatically from the data and do not require paired samples, are then used to reconstruct the data from a different scanner, removing variability present in the source scanner in the process. We use the publicly available database from an international challenge to evaluate the method, which was acquired on three different scanners and with two different protocols, and propose a new mapping towards a scanner-agnostic space. Results show that the effect size of the four studied diffusion metrics is preserved while removing variability attributable to the scanner. Experiments with alterations using a free water compartment, which is not simulated in the training data, shows that the effect size induced by the alterations is also preserved after harmonization. The algorithm is freely available and could help multicenter studies in pooling their data, while removing scanner specific confounds, and increase statistical power in the process.
Published 2019-10-01
URL https://arxiv.org/abs/1910.00272v4
PDF https://arxiv.org/pdf/1910.00272v4.pdf
PWC https://paperswithcode.com/paper/harmonization-of-diffusion-mri-datasets-with
Repo https://github.com/samuelstjean/harmonization
Framework none

Location-aware Upsampling for Semantic Segmentation

Title Location-aware Upsampling for Semantic Segmentation
Authors Xiangyu He, Zitao Mo, Qiang Chen, Anda Cheng, Peisong Wang, Jian Cheng
Abstract Many successful learning targets such as minimizing dice loss and cross-entropy loss have enabled unprecedented breakthroughs in segmentation tasks. Beyond these semantic metrics, this paper aims to introduce location supervision into semantic segmentation. Based on this idea, we present a Location-aware Upsampling (LaU) that adaptively refines the interpolating coordinates with trainable offsets. Then, location-aware losses are established by encouraging pixels to move towards well-classified locations. An LaU is offset prediction coupled with interpolation, which is trained end-to-end to generate confidence score at each position from coarse to fine. Guided by location-aware losses, the new module can replace its plain counterpart (\textit{e.g.}, bilinear upsampling) in a plug-and-play manner to further boost the leading encoder-decoder approaches. Extensive experiments validate the consistent improvement over the state-of-the-art methods on benchmark datasets. Our code is available at https://github.com/HolmesShuan/Location-aware-Upsampling-for-Semantic-Segmentation
Published 2019-11-13
URL https://arxiv.org/abs/1911.05250v2
PDF https://arxiv.org/pdf/1911.05250v2.pdf
PWC https://paperswithcode.com/paper/location-aware-upsampling-for-semantic
Repo https://github.com/HolmesShuan/Location-aware-Upsampling-for-Semantic-Segmentation
Framework pytorch

A Multi-Pass GAN for Fluid Flow Super-Resolution

Title A Multi-Pass GAN for Fluid Flow Super-Resolution
Authors Maximilian Werhahn, You Xie, Mengyu Chu, Nils Thuerey
Abstract We propose a novel method to up-sample volumetric functions with generative neural networks using several orthogonal passes. Our method decomposes generative problems on Cartesian field functions into multiple smaller sub-problems that can be learned more efficiently. Specifically, we utilize two separate generative adversarial networks: the first one up-scales slices which are parallel to the XY-plane, whereas the second one refines the whole volume along the Z-axis working on slices in the YZ-plane. In this way, we obtain full coverage for the 3D target function and can leverage spatio-temporal supervision with a set of discriminators. Additionally, we demonstrate that our method can be combined with curriculum learning and progressive growing approaches. We arrive at a first method that can up-sample volumes by a factor of eight along each dimension, i.e., increasing the number of degrees of freedom by 512. Large volumetric up-scaling factors such as this one have previously not been attainable as the required number of weights in the neural networks renders adversarial training runs prohibitively difficult. We demonstrate the generality of our trained networks with a series of comparisons to previous work, a variety of complex 3D results, and an analysis of the resulting performance.
Published 2019-06-04
URL https://arxiv.org/abs/1906.01689v1
PDF https://arxiv.org/pdf/1906.01689v1.pdf
PWC https://paperswithcode.com/paper/a-multi-pass-gan-for-fluid-flow-super
Repo https://github.com/maxwerhahn/Multi-pass-GAN
Framework tf

Deep Residual Auto-Encoders for Expectation Maximization-inspired Dictionary Learning

Title Deep Residual Auto-Encoders for Expectation Maximization-inspired Dictionary Learning
Authors Bahareh Tolooshams, Sourav Dey, Demba Ba
Abstract We introduce a neural-network architecture, termed the constrained recurrent sparse auto-encoder (CRsAE), that solves convolutional dictionary learning problems, thus establishing a link between dictionary learning and neural networks. Specifically, we leverage the interpretation of the alternating-minimization algorithm for dictionary learning as an approximate Expectation-Maximization algorithm to develop auto-encoders that enable the simultaneous training of the dictionary and regularization parameter (ReLU bias). The forward pass of the encoder approximates the sufficient statistics of the E-step as the solution to a sparse coding problem, using an iterative proximal gradient algorithm called FISTA. The encoder can be interpreted either as a recurrent neural network or as a deep residual network, with two-sided ReLU non-linearities in both cases. The M-step is implemented via a two-stage back-propagation. The first stage relies on a linear decoder applied to the encoder and a norm-squared loss. It parallels the dictionary update step in dictionary learning. The second stage updates the regularization parameter by applying a loss function to the encoder that includes a prior on the parameter motivated by Bayesian statistics. We demonstrate in an image-denoising task that CRsAE learns Gabor-like filters, and that the EM-inspired approach for learning biases is superior to the conventional approach. In an application to recordings of electrical activity from the brain, we demonstrate that CRsAE learns realistic spike templates and speeds up the process of identifying spike times by 900x compared to algorithms based on convex optimization.
Tasks Denoising, Dictionary Learning, Image Denoising
Published 2019-04-18
URL https://arxiv.org/abs/1904.08827v2
PDF https://arxiv.org/pdf/1904.08827v2.pdf
PWC https://paperswithcode.com/paper/deep-residual-auto-encoders-for-expectation
Repo https://github.com/ds2p/crsae
Framework none

A sparsity augmented probabilistic collaborative representation based classification method

Title A sparsity augmented probabilistic collaborative representation based classification method
Authors Xiao-Yun Cai, He-Feng Yin
Abstract In order to enhance the performance of image recognition, a sparsity augmented probabilistic collaborative representation based classification (SA-ProCRC) method is presented. The proposed method obtains the dense coefficient through ProCRC, then augments the dense coefficient with a sparse one, and the sparse coefficient is attained by the orthogonal matching pursuit (OMP) algorithm. In contrast to conventional methods which require explicit computation of the reconstruction residuals for each class, the proposed method employs the augmented coefficient and the label matrix of the training samples to classify the test sample. Experimental results indicate that the proposed method can achieve promising results for face and scene images. The source code of our proposed SA-ProCRC is accessible at https://github.com/yinhefeng/SAProCRC.
Published 2019-12-27
URL https://arxiv.org/abs/1912.12044v1
PDF https://arxiv.org/pdf/1912.12044v1.pdf
PWC https://paperswithcode.com/paper/a-sparsity-augmented-probabilistic
Repo https://github.com/yinhefeng/SAProCRC
Framework none

Vehicle Re-identification: exploring feature fusion using multi-stream convolutional networks

Title Vehicle Re-identification: exploring feature fusion using multi-stream convolutional networks
Authors Icaro O. de Oliveira, Rayson Laroca, David Menotti, Keiko V. O. Fonseca, Rodrigo Minetto
Abstract This work addresses the problem of vehicle re-identification through a network of non-overlapping cameras. As our main contribution, we propose a novel two-stream convolutional neural network (CNN) that simultaneously uses two of the most distinctive and persistent features available: the vehicle appearance and its license plate. This is an attempt to tackle a major problem, false alarms caused by vehicles with similar design or by very close license plate identifiers. In the first network stream, shape similarities are identified by a Siamese CNN that uses a pair of low-resolution vehicle patches recorded by two different cameras. In the second stream, we use a CNN for optical character recognition (OCR) to extract textual information, confidence scores, and string similarities from a pair of high-resolution license plate patches. Then, features from both streams are merged by a sequence of fully connected layers for decision. As part of this work, we created an important dataset for vehicle re-identification with more than three hours of videos spanning almost 3,000 vehicles. In our experiments, we achieved a precision, recall and F -score values of 99.6%, 99.2% and 99.4%, respectively. As another contribution, we discuss and compare three alternative architectures that explore the same features but using additional streams and temporal information. The proposed architectures, trained models, and dataset are publicly available at https://github.com/icarofua/vehicle-ReId .
Tasks Optical Character Recognition, Vehicle Re-Identification
Published 2019-11-13
URL https://arxiv.org/abs/1911.05541v1
PDF https://arxiv.org/pdf/1911.05541v1.pdf
PWC https://paperswithcode.com/paper/vehicle-re-identification-exploring-feature
Repo https://github.com/icarofua/vehicle-ReId
Framework tf

A Dual-Path Model With Adaptive Attention For Vehicle Re-Identification

Title A Dual-Path Model With Adaptive Attention For Vehicle Re-Identification
Authors Pirazh Khorramshahi, Amit Kumar, Neehar Peri, Sai Saketh Rambhatla, Jun-Cheng Chen, Rama Chellappa
Abstract In recent years, attention models have been extensively used for person and vehicle re-identification. Most re-identification methods are designed to focus attention on key-point locations. However, depending on the orientation, the contribution of each key-point varies. In this paper, we present a novel dual-path adaptive attention model for vehicle re-identification (AAVER). The global appearance path captures macroscopic vehicle features while the orientation conditioned part appearance path learns to capture localized discriminative features by focusing attention on the most informative key-points. Through extensive experimentation, we show that the proposed AAVER method is able to accurately re-identify vehicles in unconstrained scenarios, yielding state of the art results on the challenging dataset VeRi-776. As a byproduct, the proposed system is also able to accurately predict vehicle key-points and shows an improvement of more than 7% over state of the art. The code for key-point estimation model is available at https://github.com/Pirazh/Vehicle_Key_Point_Orientation_Estimation.
Tasks Vehicle Key-Point and Orientation Estimation, Vehicle Re-Identification
Published 2019-05-09
URL https://arxiv.org/abs/1905.03397v3
PDF https://arxiv.org/pdf/1905.03397v3.pdf
PWC https://paperswithcode.com/paper/190503397
Repo https://github.com/Pirazh/Vehicle_Key_Point_Orientation_Estimation
Framework pytorch

Screening Rules for Lasso with Non-Convex Sparse Regularizers

Title Screening Rules for Lasso with Non-Convex Sparse Regularizers
Authors Alain Rakotomamonjy, Gilles Gasso, Joseph Salmon
Abstract Leveraging on the convexity of the Lasso problem , screening rules help in accelerating solvers by discarding irrelevant variables, during the optimization process. However, because they provide better theoretical guarantees in identifying relevant variables, several non-convex regularizers for the Lasso have been proposed in the literature. This work is the first that introduces a screening rule strategy into a non-convex Lasso solver. The approach we propose is based on a iterative majorization-minimization (MM) strategy that includes a screening rule in the inner solver and a condition for propagating screened variables between iterations of MM. In addition to improve efficiency of solvers, we also provide guarantees that the inner solver is able to identify the zeros components of its critical point in finite time. Our experimental analysis illustrates the significant computational gain brought by the new screening rule compared to classical coordinate-descent or proximal gradient descent methods.
Published 2019-02-16
URL http://arxiv.org/abs/1902.06125v2
PDF http://arxiv.org/pdf/1902.06125v2.pdf
PWC https://paperswithcode.com/paper/screening-rules-for-lasso-with-non-convex
Repo https://github.com/arakotom/screening_ncvx_penalty
Framework none

Improving Vehicle Re-Identification using CNN Latent Spaces: Metrics Comparison and Track-to-track Extension

Title Improving Vehicle Re-Identification using CNN Latent Spaces: Metrics Comparison and Track-to-track Extension
Authors Geoffrey Roman-Jimenez, Patrice Guyot, Thierry Malon, Sylvie Chambon, Vincent Charvillat, Alain Crouzil, André Péninou, Julien Pinquier, Florence Sedes, Christine Sénac
Abstract This paper addresses the problem of vehicle re-identification using distance comparison of images in CNN latent spaces. First, we study the impact of the distance metrics, comparing performances obtained with different metrics: the minimal Euclidean distance (MED), the minimal cosine distance (MCD), and the residue of the sparse coding reconstruction (RSCR). These metrics are applied using features extracted through five different CNN architectures, namely ResNet18, AlexNet, VGG16, InceptionV3 and DenseNet201. We use the specific vehicle re-identification dataset VeRI to fine-tune these CNNs and evaluate results. In overall, independently from the CNN used, MCD outperforms MED, commonly used in the literature. Secondly, the state-of-the-art image-to-track process (I2TP) is extended to a track-to-track process (T2TP) without using complementary metadata. Metrics are extended to measure distance between tracks, enabling the evaluation of T2TP and comparison with I2TP using the same CNN models. Results show that T2TP outperforms I2TP for MCD and RSCR. T2TP combining DenseNet201 and MCD-based metrics exhibits the best performances, outperforming the state-of-the-art I2TP models that use complementary metadata. Finally, our experiments highlight two main results: i) the importance of the metric choice for vehicle re-identification, and ii) T2TP improves the performances compared to I2TP, especially when coupled with MCD-based metrics.
Published 2019-10-21
URL https://arxiv.org/abs/1910.09458v1
PDF https://arxiv.org/pdf/1910.09458v1.pdf
PWC https://paperswithcode.com/paper/improving-vehicle-re-identification-using-cnn
Repo https://github.com/GeoTrouvetout/Vehicle_ReID
Framework pytorch

LFFD: A Light and Fast Face Detector for Edge Devices

Title LFFD: A Light and Fast Face Detector for Edge Devices
Authors Yonghao He, Dezhong Xu, Lifang Wu, Meng Jian, Shiming Xiang, Chunhong Pan
Abstract Face detection, as a fundamental technology for various applications, is always deployed on edge devices which have limited memory storage and low computing power. This paper introduces a Light and Fast Face Detector (LFFD) for edge devices. The proposed method is anchor-free and belongs to the one-stage category. Specifically, we rethink the importance of receptive field (RF) and effective receptive field (ERF) in the background of face detection. Essentially, the RFs of neurons in a certain layer are distributed regularly in the input image and theses RFs are natural “anchors”. Combining RF “anchors” and appropriate RF strides, the proposed method can detect a large range of continuous face scales with 100% coverage in theory. The insightful understanding of relations between ERF and face scales motivates an efficient backbone for one-stage detection. The backbone is characterized by eight detection branches and common layers, resulting in efficient computation. Comprehensive and extensive experiments on popular benchmarks: WIDER FACE and FDDB are conducted. A new evaluation schema is proposed for application-oriented scenarios. Under the new schema, the proposed method can achieve superior accuracy (WIDER FACE Val/Test – Easy: 0.910/0.896, Medium: 0.881/0.865, Hard: 0.780/0.770; FDDB – discontinuous: 0.973, continuous: 0.724). Multiple hardware platforms are introduced to evaluate the running efficiency. The proposed method can obtain fast inference speed (NVIDIA TITAN Xp: 131.45 FPS at 640x480; NVIDIA TX2: 136.99 PFS at 160x120; Raspberry Pi 3 Model B+: 8.44 FPS at 160x120) with model size of 9 MB.
Published 2019-04-24
URL https://arxiv.org/abs/1904.10633v3
PDF https://arxiv.org/pdf/1904.10633v3.pdf
PWC https://paperswithcode.com/paper/lffd-a-light-and-fast-face-detector-for-edge
Repo https://github.com/YonghaoHe/A-Light-and-Fast-Face-Detector-for-Edge-Devices
Framework mxnet

Vehicle Re-identification in Aerial Imagery: Dataset and Approach

Title Vehicle Re-identification in Aerial Imagery: Dataset and Approach
Authors Peng Wang, Bingliang Jiao, Lu Yang, Yifei Yang, Shizhou Zhang, Wei Wei, Yanning Zhang
Abstract In this work, we construct a large-scale dataset for vehicle re-identification (ReID), which contains 137k images of 13k vehicle instances captured by UAV-mounted cameras. To our knowledge, it is the largest UAV-based vehicle ReID dataset. To increase intra-class variation, each vehicle is captured by at least two UAVs at different locations, with diverse view-angles and flight-altitudes. We manually label a variety of vehicle attributes, including vehicle type, color, skylight, bumper, spare tire and luggage rack. Furthermore, for each vehicle image, the annotator is also required to mark the discriminative parts that helps them to distinguish this particular vehicle from others. Besides the dataset, we also design a specific vehicle ReID algorithm to make full use of the rich annotation information. It is capable of explicitly detecting discriminative parts for each specific vehicle and significantly outperforms the evaluated baselines and state-of-the-art vehicle ReID approaches.