April 2, 2020

3461 words 17 mins read

Paper Group ANR 190

Paper Group ANR 190

Pipelined Backpropagation at Scale: Training Large Models without Batches. Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance. Domain Balancing: Face Recognition on Long-Tailed Domains. Dynamic Region-Aware Convolution. Adversarial Light Projection Attacks on Face Recognition Systems: A Feasibility Study. MODMA …

Pipelined Backpropagation at Scale: Training Large Models without Batches

Title Pipelined Backpropagation at Scale: Training Large Models without Batches
Authors Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster
Abstract Parallelism is crucial for accelerating the training of deep neural networks. Pipeline parallelism can provide an efficient alternative to traditional data parallelism by allowing workers to specialize. Performing mini-batch SGD using pipeline parallelism has the overhead of filling and draining the pipeline. Pipelined Backpropagation updates the model parameters without draining the pipeline. This removes the overhead but introduces stale gradients and inconsistency between the weights used on the forward and backward passes, reducing final accuracy and the stability of training. We introduce Spike Compensation and Linear Weight Prediction to mitigate these effects. Analysis on a convex quadratic shows that both methods effectively counteract staleness. We train multiple convolutional networks at a batch size of one, completely replacing batch parallelism with fine-grained pipeline parallelism. With our methods, Pipelined Backpropagation achieves full accuracy on CIFAR-10 and ImageNet without hyperparameter tuning.
Tasks
Published 2020-03-25
URL https://arxiv.org/abs/2003.11666v1
PDF https://arxiv.org/pdf/2003.11666v1.pdf
PWC https://paperswithcode.com/paper/pipelined-backpropagation-at-scale-training
Repo
Framework

Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

Title Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance
Authors Jaime Spencer, Richard Bowden, Simon Hadfield
Abstract “Like night and day” is a commonly used expression to imply that two things are completely different. Unfortunately, this tends to be the case for current visual feature representations of the same scene across varying seasons or times of day. The aim of this paper is to provide a dense feature representation that can be used to perform localization, sparse matching or image retrieval, regardless of the current seasonal or temporal appearance. Recently, there have been several proposed methodologies for deep learning dense feature representations. These methods make use of ground truth pixel-wise correspondences between pairs of images and focus on the spatial properties of the features. As such, they don’t address temporal or seasonal variation. Furthermore, obtaining the required pixel-wise correspondence data to train in cross-seasonal environments is highly complex in most scenarios. We propose Deja-Vu, a weakly supervised approach to learning season invariant features that does not require pixel-wise ground truth data. The proposed system only requires coarse labels indicating if two images correspond to the same location or not. From these labels, the network is trained to produce “similar” dense feature maps for corresponding locations despite environmental changes. Code will be made available at: https://github.com/jspenmar/DejaVu_Features
Tasks Image Retrieval
Published 2020-03-30
URL https://arxiv.org/abs/2003.13431v1
PDF https://arxiv.org/pdf/2003.13431v1.pdf
PWC https://paperswithcode.com/paper/same-features-different-day-weakly-supervised
Repo
Framework

Domain Balancing: Face Recognition on Long-Tailed Domains

Title Domain Balancing: Face Recognition on Long-Tailed Domains
Authors Dong Cao, Xiangyu Zhu, Xingyu Huang, Jianzhu Guo, Zhen Lei
Abstract Long-tailed problem has been an important topic in face recognition task. However, existing methods only concentrate on the long-tailed distribution of classes. Differently, we devote to the long-tailed domain distribution problem, which refers to the fact that a small number of domains frequently appear while other domains far less existing. The key challenge of the problem is that domain labels are too complicated (related to race, age, pose, illumination, etc.) and inaccessible in real applications. In this paper, we propose a novel Domain Balancing (DB) mechanism to handle this problem. Specifically, we first propose a Domain Frequency Indicator (DFI) to judge whether a sample is from head domains or tail domains. Secondly, we formulate a light-weighted Residual Balancing Mapping (RBM) block to balance the domain distribution by adjusting the network according to DFI. Finally, we propose a Domain Balancing Margin (DBM) in the loss function to further optimize the feature space of the tail domains to improve generalization. Extensive analysis and experiments on several face recognition benchmarks demonstrate that the proposed method effectively enhances the generalization capacities and achieves superior performance.
Tasks Face Recognition
Published 2020-03-30
URL https://arxiv.org/abs/2003.13791v1
PDF https://arxiv.org/pdf/2003.13791v1.pdf
PWC https://paperswithcode.com/paper/domain-balancing-face-recognition-on-long
Repo
Framework

Dynamic Region-Aware Convolution

Title Dynamic Region-Aware Convolution
Authors Jin Chen, Xijun Wang, Zichao Guo, Xiangyu Zhang, Jian Sun
Abstract We propose a new convolution called Dynamic Region-Aware Convolution (DRConv), which can automatically assign multiple filters to corresponding spatial regions where features have similar representation. In this way, DRConv outperforms standard convolution in modeling semantic variations. Standard convolution can increase the number of channels to extract more visual elements but results in high computational cost. More gracefully, our DRConv transfers the increasing channel-wise filters to spatial dimension with learnable instructor, which significantly improves representation ability of convolution and maintains translation-invariance like standard convolution. DRConv is an effective and elegant method for handling complex and variable spatial information distribution. It can substitute standard convolution in any existing networks for its plug-and-play property. We evaluate DRConv on a wide range of models (MobileNet series, ShuffleNetV2, etc.) and tasks (Classification, Face Recognition, Detection and Segmentation.). On ImageNet classification, DRConv-based ShuffleNetV2-0.5x achieves state-of-the-art performance of 67.1% at 46M multiply-adds level with 6.3% relative improvement.
Tasks Face Recognition
Published 2020-03-27
URL https://arxiv.org/abs/2003.12243v1
PDF https://arxiv.org/pdf/2003.12243v1.pdf
PWC https://paperswithcode.com/paper/dynamic-region-aware-convolution
Repo
Framework

Adversarial Light Projection Attacks on Face Recognition Systems: A Feasibility Study

Title Adversarial Light Projection Attacks on Face Recognition Systems: A Feasibility Study
Authors Luan Nguyen, Sunpreet S. Arora, Yuhang Wu, Hao Yang
Abstract Deep learning-based systems have been shown to be vulnerable to adversarial attacks in both digital and physical domains. While feasible, digital attacks have limited applicability in attacking deployed systems, including face recognition systems, where an adversary typically has access to the input and not the transmission channel. In such setting, physical attacks that directly provide a malicious input through the input channel pose a bigger threat. We investigate the feasibility of conducting real-time physical attacks on face recognition systems using adversarial light projections. A setup comprising a commercially available web camera and a projector is used to conduct the attack. The adversary uses a transformation-invariant adversarial pattern generation method to generate a digital adversarial pattern using one or more images of the target available to the adversary. The digital adversarial pattern is then projected onto the adversary’s face in the physical domain to either impersonate a target (impersonation) or evade recognition (obfuscation). We conduct preliminary experiments using two open-source and one commercial face recognition system on a pool of 50 subjects. Our experimental results demonstrate the vulnerability of face recognition systems to light projection attacks in both white-box and black-box attack settings.
Tasks Face Recognition
Published 2020-03-24
URL https://arxiv.org/abs/2003.11145v1
PDF https://arxiv.org/pdf/2003.11145v1.pdf
PWC https://paperswithcode.com/paper/adversarial-light-projection-attacks-on-face
Repo
Framework

MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis

Title MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis
Authors Hanshu Cai, Yiwen Gao, Shuting Sun, Na Li, Fuze Tian, Han Xiao, Jianxiu Li, Zhengwu Yang, Xiaowei Li, Qinglin Zhao, Zhenyu Liu, Zhijun Yao, Minqiang Yang, Hong Peng, Jing Zhu, Xiaowei Zhang, Guoping Gao, Fang Zheng, Rui Li, Zhihua Guo, Rong Ma, Jing Yang, Lan Zhang, Xiping Hu, Yumin Li, Bin Hu
Abstract According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important reason is due to the lack of physiological indicators for mental disorders. With the rising of tools such as data mining and artificial intelligence, using physiological data to explore new possible physiological indicators of mental disorder and creating new applications for mental disorder diagnosis has become a new research hot topic. However, good quality physiological data for mental disorder patients are hard to acquire. We present a multi-modal open dataset for mental-disorder analysis. The dataset includes EEG and audio data from clinically depressed patients and matching normal controls. All our patients were carefully diagnosed and selected by professional psychiatrists in hospitals. The EEG dataset includes not only data collected using traditional 128-electrodes mounted elastic cap, but also a novel wearable 3-electrode EEG collector for pervasive applications. The 128-electrodes EEG signals of 53 subjects were recorded as both in resting state and under stimulation; the 3-electrode EEG signals of 55 subjects were recorded in resting state; the audio data of 52 subjects were recorded during interviewing, reading, and picture description. We encourage other researchers in the field to use it for testing their methods of mental-disorder analysis.
Tasks EEG
Published 2020-02-20
URL https://arxiv.org/abs/2002.09283v3
PDF https://arxiv.org/pdf/2002.09283v3.pdf
PWC https://paperswithcode.com/paper/modma-dataset-a-multi-model-open-dataset-for
Repo
Framework

Continuous Silent Speech Recognition using EEG

Title Continuous Silent Speech Recognition using EEG
Authors Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
Abstract In this paper we explore continuous silent speech recognition using electroencephalography (EEG) signals. We implemented a connectionist temporal classification (CTC) automatic speech recognition (ASR) model to translate EEG signals recorded in parallel while subjects were reading English sentences in their mind without producing any voice to text. Our results demonstrate the feasibility of using EEG signals for performing continuous silent speech recognition. We demonstrate our results for a limited English vocabulary consisting of 30 unique sentences.
Tasks EEG, Speech Recognition
Published 2020-02-06
URL https://arxiv.org/abs/2002.03851v6
PDF https://arxiv.org/pdf/2002.03851v6.pdf
PWC https://paperswithcode.com/paper/towards-mind-reading
Repo
Framework

Neural Oscillations for Encoding and Decoding Declarative Memory using EEG Signals

Title Neural Oscillations for Encoding and Decoding Declarative Memory using EEG Signals
Authors Jenifer Kalafatovich, Minji Lee
Abstract Declarative memory has been studied for its relationship with remembering daily life experiences. Previous studies reported changes in power spectra during encoding phase related to behavioral performance, however decoding phase still needs to be explored. This study investigates neural oscillations changes related to memory process. Participants were asked to perform a memory task for encoding and decoding phase while EEG signals were recorded. Results showed that for encoding phase, there was a significant decrease of power in low beta, high beta bands over fronto-central area and a decrease in low beta, high beta and gamma bands over left temporal area related to successful subsequent memory effects. For decoding phase, only significant decreases of alpha power were observed over fronto-central area. This finding showed relevance of beta and alpha band for encoding and decoding phase of a memory task respectively.
Tasks EEG
Published 2020-02-04
URL https://arxiv.org/abs/2002.01126v1
PDF https://arxiv.org/pdf/2002.01126v1.pdf
PWC https://paperswithcode.com/paper/neural-oscillations-for-encoding-and-decoding
Repo
Framework

SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness

Title SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness
Authors Philipp Terhörst, Jan Niklas Kolf, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Abstract Face image quality is an important factor to enable high performance face recognition systems. Face quality assessment aims at estimating the suitability of a face image for recognition. Previous work proposed supervised solutions that require artificially or human labelled quality values. However, both labelling mechanisms are error-prone as they do not rely on a clear definition of quality and may not know the best characteristics for the utilized face recognition system. Avoiding the use of inaccurate quality labels, we proposed a novel concept to measure face quality based on an arbitrary face recognition model. By determining the embedding variations generated from random subnetworks of a face model, the robustness of a sample representation and thus, its quality is estimated. The experiments are conducted in a cross-database evaluation setting on three publicly available databases. We compare our proposed solution on two face embeddings against six state-of-the-art approaches from academia and industry. The results show that our unsupervised solution outperforms all other approaches in the majority of the investigated scenarios. In contrast to previous works, the proposed solution shows a stable performance over all scenarios. Utilizing the deployed face recognition model for our face quality assessment methodology avoids the training phase completely and further outperforms all baseline approaches by a large margin. Our solution can be easily integrated into current face recognition systems and can be modified to other tasks beyond face recognition.
Tasks Face Recognition
Published 2020-03-20
URL https://arxiv.org/abs/2003.09373v1
PDF https://arxiv.org/pdf/2003.09373v1.pdf
PWC https://paperswithcode.com/paper/ser-fiq-unsupervised-estimation-of-face-image
Repo
Framework

Learning to Correct Overexposed and Underexposed Photos

Title Learning to Correct Overexposed and Underexposed Photos
Authors Mahmoud Afifi, Konstantinos G. Derpanis, Björn Ommer, Michael S. Brown
Abstract Capturing photographs with wrong exposures remains a major source of errors in camera-based imaging. Exposure problems are categorized as either: (i) overexposed, where the camera exposure was too long, resulting in bright and washed-out image regions, or (ii) underexposed, where the exposure was too short, resulting in dark regions. Both under- and overexposure greatly reduce the contrast and visual appeal of an image. Prior work mainly focuses on underexposed images or general image enhancement. In contrast, our proposed method targets both over- and underexposure errors in photographs. We formulate the exposure correction problem as two main sub-problems: (i) color enhancement and (ii) detail enhancement. Accordingly, we propose a coarse-to-fine deep neural network (DNN) model, trainable in an end-to-end manner, that addresses each sub-problem separately. A key aspect of our solution is a new dataset of over 24,000 images exhibiting a range of exposure values with a corresponding properly exposed image. Our method achieves results on par with existing state-of-the-art methods on underexposed images and yields significant improvements for images suffering from overexposure errors.
Tasks Image Enhancement
Published 2020-03-25
URL https://arxiv.org/abs/2003.11596v1
PDF https://arxiv.org/pdf/2003.11596v1.pdf
PWC https://paperswithcode.com/paper/learning-to-correct-overexposed-and
Repo
Framework

Burst Denoising of Dark Images

Title Burst Denoising of Dark Images
Authors Ahmet Serdar Karadeniz, Erkut Erdem, Aykut Erdem
Abstract Capturing images under extremely low-light conditions poses significant challenges for the standard camera pipeline. Images become too dark and too noisy, which makes traditional image enhancement techniques almost impossible to apply. Very recently, researchers have shown promising results using learning based approaches. Motivated by these ideas, in this paper, we propose a deep learning framework for obtaining clean and colorful RGB images from extremely dark raw images. The backbone of our framework is a novel coarse-to-fine network architecture that generates high-quality outputs in a progressive manner. The coarse network predicts a low-resolution, denoised raw image, which is then fed to the fine network to recover fine-scale details and realistic textures. To further reduce noise and improve color accuracy, we extend this network to a permutation invariant structure so that it takes a burst of low-light images as input and merges information from multiple images at the feature-level. Our experiments demonstrate that the proposed approach leads to perceptually more pleasing results than state-of-the-art methods by producing much sharper and higher quality images.
Tasks Denoising, Image Enhancement
Published 2020-03-17
URL https://arxiv.org/abs/2003.07823v1
PDF https://arxiv.org/pdf/2003.07823v1.pdf
PWC https://paperswithcode.com/paper/burst-denoising-of-dark-images
Repo
Framework

Medical Image Enhancement Using Histogram Processing and Feature Extraction for Cancer Classification

Title Medical Image Enhancement Using Histogram Processing and Feature Extraction for Cancer Classification
Authors Sakshi Patel, Bharath K P, Rajesh Kumar Muthu
Abstract MRI (Magnetic Resonance Imaging) is a technique used to analyze and diagnose the problem defined by images like cancer or tumor in a brain. Physicians require good contrast images for better treatment purpose as it contains maximum information of the disease. MRI images are low contrast images which make diagnoses difficult; hence better localization of image pixels is required. Histogram Equalization techniques help to enhance the image so that it gives an improved visual quality and a well defined problem. The contrast and brightness is enhanced in such a way that it does not lose its original information and the brightness is preserved. We compare the different equalization techniques in this paper; the techniques are critically studied and elaborated. They are also tabulated to compare various parameters present in the image. In addition we have also segmented and extracted the tumor part out of the brain using K-means algorithm. For classification and feature extraction the method used is Support Vector Machine (SVM). The main goal of this research work is to help the medical field with a light of image processing.
Tasks Image Enhancement
Published 2020-03-14
URL https://arxiv.org/abs/2003.06615v1
PDF https://arxiv.org/pdf/2003.06615v1.pdf
PWC https://paperswithcode.com/paper/medical-image-enhancement-using-histogram
Repo
Framework

SIP-SegNet: A Deep Convolutional Encoder-Decoder Network for Joint Semantic Segmentation and Extraction of Sclera, Iris and Pupil based on Periocular Region Suppression

Title SIP-SegNet: A Deep Convolutional Encoder-Decoder Network for Joint Semantic Segmentation and Extraction of Sclera, Iris and Pupil based on Periocular Region Suppression
Authors Bilal Hassan, Ramsha Ahmed, Taimur Hassan, Naoufel Werghi
Abstract The current developments in the field of machine vision have opened new vistas towards deploying multimodal biometric recognition systems in various real-world applications. These systems have the ability to deal with the limitations of unimodal biometric systems which are vulnerable to spoofing, noise, non-universality and intra-class variations. In addition, the ocular traits among various biometric traits are preferably used in these recognition systems. Such systems possess high distinctiveness, permanence, and performance while, technologies based on other biometric traits (fingerprints, voice etc.) can be easily compromised. This work presents a novel deep learning framework called SIP-SegNet, which performs the joint semantic segmentation of ocular traits (sclera, iris and pupil) in unconstrained scenarios with greater accuracy. The acquired images under these scenarios exhibit purkinje reflexes, specular reflections, eye gaze, off-angle shots, low resolution, and various occlusions particularly by eyelids and eyelashes. To address these issues, SIP-SegNet begins with denoising the pristine image using denoising convolutional neural network (DnCNN), followed by reflection removal and image enhancement based on contrast limited adaptive histogram equalization (CLAHE). Our proposed framework then extracts the periocular information using adaptive thresholding and employs the fuzzy filtering technique to suppress this information. Finally, the semantic segmentation of sclera, iris and pupil is achieved using the densely connected fully convolutional encoder-decoder network. We used five CASIA datasets to evaluate the performance of SIP-SegNet based on various evaluation metrics. The simulation results validate the optimal segmentation of the proposed SIP-SegNet, with the mean f1 scores of 93.35, 95.11 and 96.69 for the sclera, iris and pupil classes respectively.
Tasks Denoising, Image Enhancement, Semantic Segmentation
Published 2020-02-15
URL https://arxiv.org/abs/2003.00825v1
PDF https://arxiv.org/pdf/2003.00825v1.pdf
PWC https://paperswithcode.com/paper/sip-segnet-a-deep-convolutional-encoder
Repo
Framework

Efficient Exploration in Constrained Environments with Goal-Oriented Reference Path

Title Efficient Exploration in Constrained Environments with Goal-Oriented Reference Path
Authors Kei Ota, Yoko Sasaki, Devesh K. Jha, Yusuke Yoshiyasu, Asako Kanezaki
Abstract In this paper, we consider the problem of building learning agents that can efficiently learn to navigate in constrained environments. The main goal is to design agents that can efficiently learn to understand and generalize to different environments using high-dimensional inputs (a 2D map), while following feasible paths that avoid obstacles in obstacle-cluttered environment. To achieve this, we make use of traditional path planning algorithms, supervised learning, and reinforcement learning algorithms in a synergistic way. The key idea is to decouple the navigation problem into planning and control, the former of which is achieved by supervised learning whereas the latter is done by reinforcement learning. Specifically, we train a deep convolutional network that can predict collision-free paths based on a map of the environment– this is then used by a reinforcement learning algorithm to learn to closely follow the path. This allows the trained agent to achieve good generalization while learning faster. We test our proposed method in the recently proposed Safety Gym suite that allows testing of safety-constraints during training of learning agents. We compare our proposed method with existing work and show that our method consistently improves the sample efficiency and generalization capability to novel environments.
Tasks Efficient Exploration
Published 2020-03-03
URL https://arxiv.org/abs/2003.01641v1
PDF https://arxiv.org/pdf/2003.01641v1.pdf
PWC https://paperswithcode.com/paper/efficient-exploration-in-constrained
Repo
Framework

Learning Global and Local Consistent Representations for Unsupervised Image Retrieval via Deep Graph Diffusion Networks

Title Learning Global and Local Consistent Representations for Unsupervised Image Retrieval via Deep Graph Diffusion Networks
Authors Zhiyong Dou, Haotian Cui, Bo Wang
Abstract Diffusion has shown great success in improving accuracy of unsupervised image retrieval systems by utilizing high-order structures of image manifold. However, existing diffusion methods suffer from three major limitations: 1) they usually rely on local structures without considering global manifold information; 2) they focus on improving pair-wise similarities within existing images input output transductively while lacking flexibility to learn representations for novel unseen instances inductively; 3) they fail to scale to large datasets due to prohibitive memory consumption and computational burden due to intrinsic high-order operations on the whole graph. In this paper, to address these limitations, we propose a novel method,Graph Diffusion Networks (GRAD-Net), that adopts graph neural networks (GNNs), a novel variant of deep learning algorithms on irregular graphs. GRAD-Net learns semantic representations by exploiting both local and global structures of image manifold in an unsupervised fashion. By utilizing sparse coding techniques, GRAD-Net not only preserves global information on the image manifold, but also enables scalable training and efficient querying. Experiments on several large benchmark datasets demonstrate effectiveness of our method over state-of-the-art diffusion algorithms for unsupervised image retrieval.
Tasks Image Retrieval
Published 2020-01-05
URL https://arxiv.org/abs/2001.01284v1
PDF https://arxiv.org/pdf/2001.01284v1.pdf
PWC https://paperswithcode.com/paper/learning-global-and-local-consistent
Repo
Framework
comments powered by Disqus