Paper Group ANR 816
Low-light Image Enhancement Algorithm Based on Retinex and Generative Adversarial Network. Context-Aware Automatic Occlusion Removal. FAN: Feature Adaptation Network for Surveillance Face Recognition and Normalization. When redundancy is useful: A Bayesian approach to ‘overinformative’ referring expressions. Mis-classified Vector Guided Softmax Los …
Low-light Image Enhancement Algorithm Based on Retinex and Generative Adversarial Network
Title | Low-light Image Enhancement Algorithm Based on Retinex and Generative Adversarial Network |
Authors | Yangming Shi, Xiaopo Wu, Ming Zhu |
Abstract | Low-light image enhancement is generally regarded as a challenging task in image processing, especially for the complex visual tasks at night or weakly illuminated. In order to reduce the blurs or noises on the low-light images, a large number of papers have contributed to applying different technologies. Regretfully, most of them had served little purposes in coping with the extremely poor illumination parts of images or test in practice. In this work, the authors propose a novel approach for processing low-light images based on the Retinex theory and generative adversarial network (GAN), which is composed of the decomposition part for splitting the image into illumination image and reflected image, and the enhancement part for generating high-quality image. Such a discriminative network is expected to make the generated image clearer. Couples of experiments have been implemented under the circumstance of different lighting strength on the basis of Converted See-In-the-Dark (CSID) datasets, and the satisfactory results have been achieved with exceeding expectation that much encourages the authors. In a word, the proposed GAN-based network and employed Retinex theory in this work have proven to be effective in dealing with the low-light image enhancement problems, which will benefit the image processing with no doubt. |
Tasks | Image Enhancement, Low-Light Image Enhancement |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06027v1 |
https://arxiv.org/pdf/1906.06027v1.pdf | |
PWC | https://paperswithcode.com/paper/low-light-image-enhancement-algorithm-based |
Repo | |
Framework | |
Context-Aware Automatic Occlusion Removal
Title | Context-Aware Automatic Occlusion Removal |
Authors | Kumara Kahatapitiya, Dumindu Tissera, Ranga Rodrigo |
Abstract | Occlusion removal is an interesting application of image enhancement, for which, existing work suggests manually-annotated or domain-specific occlusion removal. No work tries to address automatic occlusion detection and removal as a context-aware generic problem. In this paper, we present a novel methodology to identify objects that do not relate to the image context as occlusions and remove them, reconstructing the space occupied coherently. The proposed system detects occlusions by considering the relation between foreground and background object classes represented as vector embeddings, and removes them through inpainting. We test our system on COCO-Stuff dataset and conduct a user study to establish a baseline in context-aware automatic occlusion removal. |
Tasks | Image Enhancement |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02710v1 |
https://arxiv.org/pdf/1905.02710v1.pdf | |
PWC | https://paperswithcode.com/paper/context-aware-automatic-occlusion-removal |
Repo | |
Framework | |
FAN: Feature Adaptation Network for Surveillance Face Recognition and Normalization
Title | FAN: Feature Adaptation Network for Surveillance Face Recognition and Normalization |
Authors | Xi Yin, Ying Tai, Yuge Huang, Xiaoming Liu |
Abstract | This paper studies face recognition (FR) and normalization in surveillance imagery. Surveillance FR is a challenging problem that has great values in law enforcement. Despite recent progress in conventional FR, less effort has been devoted to surveillance FR. To bridge this gap, we propose a Feature Adaptation Network (FAN) to jointly perform surveillance FR and normalization. Our face normalization mainly acts on the aspect of image resolution, closely related to face super-resolution. However, previous face super-resolution methods require paired training data with pixel-to-pixel correspondence, which is typically unavailable between real low- and high-resolution faces. Our FAN can leverage both paired and unpaired data as we disentangle the features into identity and non-identity components and adapt the distribution of the identity features, which breaks the limit of current face super-resolution methods. We further propose a random scale augmentation scheme to learn resolution robust identity features, with advantages over previous fixed scale augmentation. Extensive experiments on LFW, WIDER FACE, QUML-SurvFace and SCface datasets have demonstrated the superiority of our proposed method compared to the state of the arts on surveillance face recognition and normalization. |
Tasks | Face Recognition, Super-Resolution |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11680v1 |
https://arxiv.org/pdf/1911.11680v1.pdf | |
PWC | https://paperswithcode.com/paper/fan-feature-adaptation-network-for |
Repo | |
Framework | |
When redundancy is useful: A Bayesian approach to ‘overinformative’ referring expressions
Title | When redundancy is useful: A Bayesian approach to ‘overinformative’ referring expressions |
Authors | Judith Degen, Robert D. Hawkins, Caroline Graf, Elisa Kreiss, Noah D. Goodman |
Abstract | Referring is one of the most basic and prevalent uses of language. How do speakers choose from the wealth of referring expressions at their disposal? Rational theories of language use have come under attack for decades for not being able to account for the seemingly irrational overinformativeness ubiquitous in referring expressions. Here we present a novel production model of referring expressions within the Rational Speech Act framework that treats speakers as agents that rationally trade off cost and informativeness of utterances. Crucially, we relax the assumption that informativeness is computed with respect to a deterministic Boolean semantics, in favor of a non-deterministic continuous semantics. This innovation allows us to capture a large number of seemingly disparate phenomena within one unified framework: the basic asymmetry in speakers’ propensity to overmodify with color rather than size; the increase in overmodification in complex scenes; the increase in overmodification with atypical features; and the increase in specificity in nominal reference as a function of typicality. These findings cast a new light on the production of referring expressions: rather than being wastefully overinformative, reference is usefully redundant. |
Tasks | |
Published | 2019-03-19 |
URL | https://arxiv.org/abs/1903.08237v3 |
https://arxiv.org/pdf/1903.08237v3.pdf | |
PWC | https://paperswithcode.com/paper/when-redundancy-is-rational-a-bayesian |
Repo | |
Framework | |
Mis-classified Vector Guided Softmax Loss for Face Recognition
Title | Mis-classified Vector Guided Softmax Loss for Face Recognition |
Authors | Xiaobo Wang, Shifeng Zhang, Shuo Wang, Tianyu Fu, Hailin Shi, Tao Mei |
Abstract | Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (\textit{e.g.}, angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. |
Tasks | Face Recognition |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1912.00833v1 |
https://arxiv.org/pdf/1912.00833v1.pdf | |
PWC | https://paperswithcode.com/paper/mis-classified-vector-guided-softmax-loss-for |
Repo | |
Framework | |
CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning
Title | CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning |
Authors | Amirhossein Reisizadeh, Saurav Prakash, Ramtin Pedarsani, Amir Salman Avestimehr |
Abstract | We focus on the commonly used synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key system bottlenecks: communication bandwidth and stragglers’ delays. In particular, Ring-AllReduce (RAR) design has been proposed to avoid bandwidth bottleneck at any particular node by allowing each worker to only communicate with its neighbors that are arranged in a logical ring. On the other hand, Gradient Coding (GC) has been recently proposed to mitigate stragglers in a master-worker topology by allowing carefully designed redundant allocation of the data set to the workers. We propose a joint communication topology design and data set allocation strategy, named CodedReduce (CR), that combines the best of both RAR and GC. That is, it parallelizes the communications over a tree topology leading to efficient bandwidth utilization, and carefully designs a redundant data set allocation and coding strategy at the nodes to make the proposed gradient aggregation scheme robust to stragglers. In particular, we quantify the communication parallelization gain and resiliency of the proposed CR scheme, and prove its optimality when the communication topology is a regular tree. Furthermore, we empirically evaluate the performance of our proposed CR design over Amazon EC2 and demonstrate that it achieves speedups of up to 27.2x and 7.0x, respectively over the benchmarks GC and RAR. |
Tasks | |
Published | 2019-02-06 |
URL | https://arxiv.org/abs/1902.01981v2 |
https://arxiv.org/pdf/1902.01981v2.pdf | |
PWC | https://paperswithcode.com/paper/codedreduce-a-fast-and-robust-framework-for |
Repo | |
Framework | |
Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation
Title | Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation |
Authors | Zhong Zhou, Lori Levin, David R. Mortensen, Alex Waibel |
Abstract | We demonstrate a new approach to Neural Machine Translation (NMT) for low-resource languages using a ubiquitous linguistic resource, Interlinear Glossed Text (IGT). IGT represents a non-English sentence as a sequence of English lemmas and morpheme labels. As such, it can serve as a pivot or interlingua for NMT. Our contribution is four-fold. Firstly, we pool IGT for 1,497 languages in ODIN (54,545 glosses) and 70,918 glosses in Arapaho and train a gloss-to-target NMT system from IGT to English, with a BLEU score of 25.94. We introduce a multilingual NMT model that tags all glossed text with gloss-source language tags and train a universal system with shared attention across 1,497 languages. Secondly, we use the IGT gloss-to-target translation as a key step in an English-Turkish MT system trained on only 865 lines from ODIN. Thirdly, we we present five metrics for evaluating extremely low-resource translation when BLEU is no longer sufficient and evaluate the Turkish low-resource system using BLEU and also using accuracy of matching nouns, verbs, agreement, tense, and spurious repetition, showing large improvements. |
Tasks | Machine Translation |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02709v3 |
https://arxiv.org/pdf/1911.02709v3.pdf | |
PWC | https://paperswithcode.com/paper/low-resource-machine-translation-using |
Repo | |
Framework | |
Deep Learning of Compressed Sensing Operators with Structural Similarity Loss
Title | Deep Learning of Compressed Sensing Operators with Structural Similarity Loss |
Authors | Yochai Zur, Amir Adler |
Abstract | Compressed sensing (CS) is a signal processing framework for efficiently reconstructing a signal from a small number of measurements, obtained by linear projections of the signal. In this paper we present an end-to-end deep learning approach for CS, in which a fully-connected network performs both the linear sensing and non-linear reconstruction stages. During the training phase, the sensing matrix and the non-linear reconstruction operator are jointly optimized using Structural similarity index (SSIM) as loss rather than the standard Mean Squared Error (MSE) loss. We compare the proposed approach with state-of-the-art in terms of reconstruction quality under both losses, i.e. SSIM score and MSE score. |
Tasks | |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10411v1 |
https://arxiv.org/pdf/1906.10411v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-of-compressed-sensing-operators |
Repo | |
Framework | |
CVIT-MT Systems for WAT-2018
Title | CVIT-MT Systems for WAT-2018 |
Authors | Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar |
Abstract | This document describes the machine translation system used in the submissions of IIIT-Hyderabad CVIT-MT for the WAT-2018 English-Hindi translation task. Performance is evaluated on the associated corpus provided by the organizers. We experimented with convolutional sequence to sequence architectures. We also train with additional data obtained through backtranslation. |
Tasks | Machine Translation |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.07917v1 |
http://arxiv.org/pdf/1903.07917v1.pdf | |
PWC | https://paperswithcode.com/paper/cvit-mt-systems-for-wat-2018 |
Repo | |
Framework | |
FaRM: Fair Reward Mechanism for Information Aggregation in Spontaneous Localized Settings (Extended Version)
Title | FaRM: Fair Reward Mechanism for Information Aggregation in Spontaneous Localized Settings (Extended Version) |
Authors | Moin Hussain Moti, Dimitris Chatzopoulos, Pan Hui, Sujit Gujar |
Abstract | Although peer prediction markets are widely used in crowdsourcing to aggregate information from agents, they often fail to reward the participating agents equitably. Honest agents can be wrongly penalized if randomly paired with dishonest ones. In this work, we introduce \emph{selective} and \emph{cumulative} fairness. We characterize a mechanism as fair if it satisfies both notions and present FaRM, a representative mechanism we designed. FaRM is a Nash incentive mechanism that focuses on information aggregation for spontaneous local activities which are accessible to a limited number of agents without assuming any prior knowledge of the event. All the agents in the vicinity observe the same information. FaRM uses \textit{(i)} a \emph{report strength score} to remove the risk of random pairing with dishonest reporters, \textit{(ii)} a \emph{consistency score} to measure an agent’s history of accurate reports and distinguish valuable reports, \textit{(iii)} a \emph{reliability score} to estimate the probability of an agent to collude with nearby agents and prevents agents from getting swayed, and \textit{(iv)} a \emph{location robustness score} to filter agents who try to participate without being present in the considered setting. Together, report strength, consistency, and reliability represent a fair reward given to agents based on their reports. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03963v1 |
https://arxiv.org/pdf/1906.03963v1.pdf | |
PWC | https://paperswithcode.com/paper/farm-fair-reward-mechanism-for-information |
Repo | |
Framework | |
Global Adaptive Generative Adjustment
Title | Global Adaptive Generative Adjustment |
Authors | Bin Wang, Xiaofei Wang, Jianhua Guo |
Abstract | Many traditional signal recovery approaches can behave well basing on the penalized likelihood. However, they have to meet with the difficulty in the selection of hyperparameters or tuning parameters in the penalties. In this article, we propose a global adaptive generative adjustment (GAGA) algorithm for signal recovery, in which multiple hyperpameters are automatically learned and alternatively updated with the signal. We further prove that the output of our algorithm directly guarantees the consistency of model selection and the asymptotic normality of signal estimate. Moreover, we also propose a variant GAGA algorithm for improving the computational efficiency in the high-dimensional data analysis. Finally, in the simulated experiment, we consider the consistency of the outputs of our algorithms, and compare our algorithms to other penalized likelihood methods: the Adaptive LASSO, the SCAD and the MCP. The simulation results support the efficiency of our algorithms for signal recovery, and demonstrate that our algorithms outperform the other algorithms. |
Tasks | Model Selection |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.00658v2 |
https://arxiv.org/pdf/1911.00658v2.pdf | |
PWC | https://paperswithcode.com/paper/global-adaptive-generative-adjustment |
Repo | |
Framework | |
“I’m sorry Dave, I’m afraid I can’t do that” Deep Q-learning from forbidden action
Title | “I’m sorry Dave, I’m afraid I can’t do that” Deep Q-learning from forbidden action |
Authors | Mathieu Seurin, Philippe Preux, Olivier Pietquin |
Abstract | The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain. |
Tasks | Q-Learning |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02078v2 |
https://arxiv.org/pdf/1910.02078v2.pdf | |
PWC | https://paperswithcode.com/paper/im-sorry-dave-im-afraid-i-cant-do-that-deep-q |
Repo | |
Framework | |
Deep learning architectures for automated image segmentation
Title | Deep learning architectures for automated image segmentation |
Authors | Debleena Sengupta |
Abstract | Image segmentation is widely used in a variety of computer vision tasks, such as object localization and recognition, boundary detection, and medical imaging. This thesis proposes deep learning architectures to improve automatic object localization and boundary delineation for salient object segmentation in natural images and for 2D medical image segmentation. First, we propose and evaluate a novel dilated dense encoder-decoder architecture with a custom dilated spatial pyramid pooling block to accurately localize and delineate boundaries for salient object segmentation. The dilation offers better spatial understanding and the dense connectivity preserves features learned at shallower levels of the network for better localization. Tested on three publicly available datasets, our architecture outperforms the state-of-the-art for one and is very competitive on the other two. Second, we propose and evaluate a custom 2D dilated dense UNet architecture for accurate lesion localization and segmentation in medical images. This architecture can be utilized as a stand-alone segmentation framework or used as a rich feature extracting backbone to aid other models in medical image segmentation. Our architecture outperforms all baseline models for accurate lesion localization and segmentation on a new dataset. We furthermore explore the main considerations that should be taken into account for 3D medical image segmentation, among them preprocessing techniques and specialized loss functions. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.10333v1 |
https://arxiv.org/pdf/1909.10333v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-architectures-for-automated |
Repo | |
Framework | |
Deep Representations for Cross-spectral Ocular Biometrics
Title | Deep Representations for Cross-spectral Ocular Biometrics |
Authors | Luiz A. Zanlorensi, Diego R. Lucio, Alceu S. Britto Jr., Hugo Proença, David Menotti |
Abstract | One of the major challenges in ocular biometrics is the cross-spectral scenario, i.e., how to match images acquired in different wavelengths (typically visible (VIS) against near-infrared (NIR)). This article designs and extensively evaluates cross-spectral ocular verification methods, for both the closed and open-world settings, using well known deep learning representations based on the iris and periocular regions. Using as inputs the bounding boxes of non-normalized iris/periocular regions, we fine-tune Convolutional Neural Network(CNN) models (based either on VGG16 or ResNet-50 architectures), originally trained for face recognition. Based on the experiments carried out in two publicly available cross-spectral ocular databases, we report results for intra-spectral and cross-spectral scenarios, with the best performance being observed when fusing ResNet-50 deep representations from both the periocular and iris regions. When compared to the state-of-the-art, we observed that the proposed solution consistently reduces the Equal Error Rate(EER) values by 90% / 93% / 96% and 61% / 77% / 83% on the cross-spectral scenario and in the PolyU Bi-spectral and Cross-eye-cross-spectral datasets. Lastly, we evaluate the effect that the “deepness” factor of feature representations has in recognition effectiveness, and - based on a subjective analysis of the most problematic pairwise comparisons - we point out further directions for this field of research. |
Tasks | Face Recognition |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09509v1 |
https://arxiv.org/pdf/1911.09509v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-representations-for-cross-spectral |
Repo | |
Framework | |
Robust, fast and accurate: a 3-step method for automatic histological image registration
Title | Robust, fast and accurate: a 3-step method for automatic histological image registration |
Authors | Johannes Lotz, Nick Weiss, Stefan Heldmann |
Abstract | We present a 3-step registration pipeline for differently stained histological serial sections that consists of 1) a robust pre-alignment, 2) a parametric registration computed on coarse resolution images, and 3) an accurate nonlinear registration. In all three steps the NGF distance measure is minimized with respect to an increasingly flexible transformation. We apply the method in the ANHIR image registration challenge and evaluate its performance on the training data. The presented method is robust (error reduction in 99.6% of the cases), fast (runtime 4 seconds) and accurate (median relative target registration error 0.19%). |
Tasks | Image Registration |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12063v2 |
http://arxiv.org/pdf/1903.12063v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-fast-and-accurate-a-3-step-method-for |
Repo | |
Framework | |