October 17, 2019

3427 words 17 mins read

Paper Group ANR 792

Coarse-to-fine Semantic Segmentation from Image-level Labels. The Power of Complementary Regularizers: Image Recovery via Transform Learning and Low-Rank Modeling. Boosting the Robustness Verification of DNN by Identifying the Achilles’s Heel. Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!. Mask-aware Photorealistic Face Attribu …

Coarse-to-fine Semantic Segmentation from Image-level Labels


Title	Coarse-to-fine Semantic Segmentation from Image-level Labels
Authors	Longlong Jing, Yucheng Chen, Yingli Tian
Abstract	Deep neural network-based semantic segmentation generally requires large-scale cost extensive annotations for training to obtain better performance. To avoid pixel-wise segmentation annotations which are needed for most methods, recently some researchers attempted to use object-level labels (e.g. bounding boxes) or image-level labels (e.g. image categories). In this paper, we propose a novel recursive coarse-to-fine semantic segmentation framework based on only image-level category labels. For each image, an initial coarse mask is first generated by a convolutional neural network-based unsupervised foreground segmentation model and then is enhanced by a graph model. The enhanced coarse mask is fed to a fully convolutional neural network to be recursively refined. Unlike existing image-level label-based semantic segmentation methods which require to label all categories for images contain multiple types of objects, our framework only needs one label for each image and can handle images contains multi-category objects. With only trained on ImageNet, our framework achieves comparable performance on PASCAL VOC dataset as other image-level label-based state-of-the-arts of semantic segmentation. Furthermore, our framework can be easily extended to foreground object segmentation task and achieves comparable performance with the state-of-the-art supervised methods on the Internet Object dataset.
Tasks	Semantic Segmentation
Published	2018-12-28
URL	http://arxiv.org/abs/1812.10885v1
PDF	http://arxiv.org/pdf/1812.10885v1.pdf
PWC	https://paperswithcode.com/paper/coarse-to-fine-semantic-segmentation-from
Repo
Framework

The Power of Complementary Regularizers: Image Recovery via Transform Learning and Low-Rank Modeling


Title	The Power of Complementary Regularizers: Image Recovery via Transform Learning and Low-Rank Modeling
Authors	Bihan Wen, Yanjun Li, Yoram Bresler
Abstract	Recent works on adaptive sparse and on low-rank signal modeling have demonstrated their usefulness in various image / video processing applications. Patch-based methods exploit local patch sparsity, whereas other works apply low-rankness of grouped patches to exploit image non-local structures. However, using either approach alone usually limits performance in image reconstruction or recovery applications. In this work, we propose a simultaneous sparsity and low-rank model, dubbed STROLLR, to better represent natural images. In order to fully utilize both the local and non-local image properties, we develop an image restoration framework using a transform learning scheme with joint low-rank regularization. The approach owes some of its computational efficiency and good performance to the use of transform learning for adaptive sparse representation rather than the popular synthesis dictionary learning algorithms, which involve approximation of NP-hard sparse coding and expensive learning steps. We demonstrate the proposed framework in various applications to image denoising, inpainting, and compressed sensing based magnetic resonance imaging. Results show promising performance compared to state-of-the-art competing methods.
Tasks	Denoising, Dictionary Learning, Image Denoising, Image Reconstruction, Image Restoration
Published	2018-08-03
URL	http://arxiv.org/abs/1808.01316v1
PDF	http://arxiv.org/pdf/1808.01316v1.pdf
PWC	https://paperswithcode.com/paper/the-power-of-complementary-regularizers-image
Repo
Framework

Boosting the Robustness Verification of DNN by Identifying the Achilles’s Heel


Title	Boosting the Robustness Verification of DNN by Identifying the Achilles’s Heel
Authors	Chengdong Feng, Zhenbang Chen, Weijiang Hong, Hengbiao Yu, Wei Dong, Ji Wang
Abstract	Deep Neural Network (DNN) is a widely used deep learning technique. How to ensure the safety of DNN-based system is a critical problem for the research and application of DNN. Robustness is an important safety property of DNN. However, existing work of verifying DNN’s robustness is time-consuming and hard to scale to large-scale DNNs. In this paper, we propose a boosting method for DNN robustness verification, aiming to find counter-examples earlier. Our observation is DNN’s different inputs have different possibilities of existing counter-examples around them, and the input with a small difference between the largest output value and the second largest output value tends to be the achilles’s heel of the DNN. We have implemented our method and applied it on Reluplex, a state-of-the-art DNN verification tool, and four DNN attacking methods. The results of the extensive experiments on two benchmarks indicate the effectiveness of our boosting method.
Tasks
Published	2018-11-17
URL	http://arxiv.org/abs/1811.07108v1
PDF	http://arxiv.org/pdf/1811.07108v1.pdf
PWC	https://paperswithcode.com/paper/boosting-the-robustness-verification-of-dnn
Repo
Framework

Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!


Title	Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!
Authors	Maneet Singh, Shruti Nagpal, Mayank Vatsa, Richa Singh, Afzel Noore
Abstract	Autoencoders are unsupervised deep learning models used for learning representations. In literature, autoencoders have shown to perform well on a variety of tasks spread across multiple domains, thereby establishing widespread applicability. Typically, an autoencoder is trained to generate a model that minimizes the reconstruction error between the input and the reconstructed output, computed in terms of the Euclidean distance. While this can be useful for applications related to unsupervised reconstruction, it may not be optimal for classification. In this paper, we propose a novel Supervised COSMOS Autoencoder which utilizes a multi-objective loss function to learn representations that simultaneously encode the (i) “similarity” between the input and reconstructed vectors in terms of their direction, (ii) “distribution” of pixel values of the reconstruction with respect to the input sample, while also incorporating (iii) “discriminability” in the feature learning pipeline. The proposed autoencoder model incorporates a Cosine similarity and Mahalanobis distance based loss function, along with supervision via Mutual Information based loss. Detailed analysis of each component of the proposed model motivates its applicability for feature learning in different classification tasks. The efficacy of Supervised COSMOS autoencoder is demonstrated via extensive experimental evaluations on different image datasets. The proposed model outperforms existing algorithms on MNIST, CIFAR-10, and SVHN databases. It also yields state-of-the-art results on CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face recognition, respectively.
Tasks	Face Recognition
Published	2018-10-15
URL	http://arxiv.org/abs/1810.06221v1
PDF	http://arxiv.org/pdf/1810.06221v1.pdf
PWC	https://paperswithcode.com/paper/supervised-cosmos-autoencoder-learning-beyond
Repo
Framework

Mask-aware Photorealistic Face Attribute Manipulation


Title	Mask-aware Photorealistic Face Attribute Manipulation
Authors	Ruoqi Sun, Chen Huang, Jianping Shi, Lizhuang Ma
Abstract	The task of face attribute manipulation has found increasing applications, but still remains challeng- ing with the requirement of editing the attributes of a face image while preserving its unique details. In this paper, we choose to combine the Variational AutoEncoder (VAE) and Generative Adversarial Network (GAN) for photorealistic image genera- tion. We propose an effective method to modify a modest amount of pixels in the feature maps of an encoder, changing the attribute strength contin- uously without hindering global information. Our training objectives of VAE and GAN are reinforced by the supervision of face recognition loss and cy- cle consistency loss for faithful preservation of face details. Moreover, we generate facial masks to en- force background consistency, which allows our training to focus on manipulating the foreground face rather than background. Experimental results demonstrate our method, called Mask-Adversarial AutoEncoder (M-AAE), can generate high-quality images with changing attributes and outperforms prior methods in detail preservation.
Tasks	Face Recognition
Published	2018-04-24
URL	http://arxiv.org/abs/1804.08882v1
PDF	http://arxiv.org/pdf/1804.08882v1.pdf
PWC	https://paperswithcode.com/paper/mask-aware-photorealistic-face-attribute
Repo
Framework

Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces


Title	Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces
Authors	Louis Faury, Flavian Vasile
Abstract	Learning to optimize - the idea that we can learn from data algorithms that optimize a numerical criterion - has recently been at the heart of a growing number of research efforts. One of the most challenging issues within this approach is to learn a policy that is able to optimize over classes of functions that are fairly different from the ones that it was trained on. We propose a novel way of framing learning to optimize as a problem of learning a good navigation policy on a partially observable loss surface. To this end, we develop Rover Descent, a solution that allows us to learn a fairly broad optimization policy from training on a small set of prototypical two-dimensional surfaces that encompasses the classically hard cases such as valleys, plateaus, cliffs and saddles and by using strictly zero-order information. We show that, without having access to gradient or curvature information, we achieve state-of-the-art convergence speed on optimization problems not presented at training time such as the Rosenbrock function and other hard cases in two dimensions. We extend our framework to optimize over high dimensional landscapes, while still handling only two-dimensional local landscape information and show good preliminary results.
Tasks
Published	2018-01-22
URL	http://arxiv.org/abs/1801.07222v3
PDF	http://arxiv.org/pdf/1801.07222v3.pdf
PWC	https://paperswithcode.com/paper/rover-descent-learning-to-optimize-by
Repo
Framework

Retinal Optic Disc Segmentation using Conditional Generative Adversarial Network


Title	Retinal Optic Disc Segmentation using Conditional Generative Adversarial Network
Authors	Vivek Kumar Singh, Hatem Rashwan, Farhan Akram, Nidhi Pandey, Md. Mostaf Kamal Sarker, Adel Saleh, Saddam Abdulwahab, Najlaa Maaroof, Santiago Romani, Domenec Puig
Abstract	This paper proposed a retinal image segmentation method based on conditional Generative Adversarial Network (cGAN) to segment optic disc. The proposed model consists of two successive networks: generator and discriminator. The generator learns to map information from the observing input (i.e., retinal fundus color image), to the output (i.e., binary mask). Then, the discriminator learns as a loss function to train this mapping by comparing the ground-truth and the predicted output with observing the input image as a condition.Experiments were performed on two publicly available dataset; DRISHTI GS1 and RIM-ONE. The proposed model outperformed state-of-the-art-methods by achieving around 0.96% and 0.98% of Jaccard and Dice coefficients, respectively. Moreover, an image segmentation is performed in less than a second on recent GPU.
Tasks	Semantic Segmentation
Published	2018-06-11
URL	http://arxiv.org/abs/1806.03905v1
PDF	http://arxiv.org/pdf/1806.03905v1.pdf
PWC	https://paperswithcode.com/paper/retinal-optic-disc-segmentation-using
Repo
Framework

Learning Implicit Generative Models with the Method of Learned Moments


Title	Learning Implicit Generative Models with the Method of Learned Moments
Authors	Suman Ravuri, Shakir Mohamed, Mihaela Rosca, Oriol Vinyals
Abstract	We propose a method of moments (MoM) algorithm for training large-scale implicit generative models. Moment estimation in this setting encounters two problems: it is often difficult to define the millions of moments needed to learn the model parameters, and it is hard to determine which properties are useful when specifying moments. To address the first issue, we introduce a moment network, and define the moments as the network’s hidden units and the gradient of the network’s output with the respect to its parameters. To tackle the second problem, we use asymptotic theory to highlight desiderata for moments – namely they should minimize the asymptotic variance of estimated model parameters – and introduce an objective to learn better moments. The sequence of objectives created by this Method of Learned Moments (MoLM) can train high-quality neural image samplers. On CIFAR-10, we demonstrate that MoLM-trained generators achieve significantly higher Inception Scores and lower Frechet Inception Distances than those trained with gradient penalty-regularized and spectrally-normalized adversarial objectives. These generators also achieve nearly perfect Multi-Scale Structural Similarity Scores on CelebA, and can create high-quality samples of 128x128 images.
Tasks
Published	2018-06-28
URL	http://arxiv.org/abs/1806.11006v1
PDF	http://arxiv.org/pdf/1806.11006v1.pdf
PWC	https://paperswithcode.com/paper/learning-implicit-generative-models-with-the
Repo
Framework

An Interval Type-2 Fuzzy Approach to Automatic PDF Generation for Histogram Specification


Title	An Interval Type-2 Fuzzy Approach to Automatic PDF Generation for Histogram Specification
Authors	Vishal Agarwal, Diwanshu Jain, A. Vamshi Krishna Reddy, Frank Chung-Hoon Rhee
Abstract	Image enhancement plays an important role in several application in the field of computer vision and image processing. Histogram specification (HS) is one of the most widely used techniques for contrast enhancement of an image, which requires an appropriate probability density function for the transformation. In this paper, we propose a fuzzy method to find a suitable PDF automatically for histogram specification using interval type - 2 (IT2) fuzzy approach, based on the fuzzy membership values obtained from the histogram of input image. The proposed algorithm works in 5 stages which includes - symmetric Gaussian fitting on the histogram, extraction of IT2 fuzzy membership functions (MFs) and therefore, footprint of uncertainty (FOU), obtaining membership value (MV), generating PDF and application of HS. We have proposed 4 different methods to find membership values - point-wise method, center of weight method, area method, and karnik-mendel (KM) method. The framework is sensitive to local variations in the histogram and chooses the best PDF so as to improve contrast enhancement. Experimental validity of the methods used is illustrated by qualitative and quantitative analysis on several images using the image quality index - Average Information Content (AIC) or Entropy, and by comparison with the commonly used algorithms such as Histogram Equalization (HE), Recursive Mean-Separate Histogram Equalization (RMSHE) and Brightness Preserving Fuzzy Histogram Equalization (BPFHE). It has been found out that on an average, our algorithm improves the AIC index by 11.5% as compared to the index obtained by histogram equalisation.
Tasks	Image Enhancement
Published	2018-05-06
URL	http://arxiv.org/abs/1805.02173v1
PDF	http://arxiv.org/pdf/1805.02173v1.pdf
PWC	https://paperswithcode.com/paper/an-interval-type-2-fuzzy-approach-to
Repo
Framework

Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics


Title	Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics
Authors	Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
Abstract	An audiovisual speaker conversion method is presented for simultaneously transforming the facial expressions and voice of a source speaker into those of a target speaker. Transforming the facial and acoustic features together makes it possible for the converted voice and facial expressions to be highly correlated and for the generated target speaker to appear and sound natural. It uses three neural networks: a conversion network that fuses and transforms the facial and acoustic features, a waveform generation network that produces the waveform from both the converted facial and acoustic features, and an image reconstruction network that outputs an RGB facial image also based on both the converted features. The results of experiments using an emotional audiovisual database showed that the proposed method achieved significantly higher naturalness compared with one that separately transformed acoustic and facial features.
Tasks	Image Reconstruction
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12730v2
PDF	http://arxiv.org/pdf/1810.12730v2.pdf
PWC	https://paperswithcode.com/paper/audiovisual-speaker-conversion-jointly-and
Repo
Framework

Omni-directional Feature Learning for Person Re-identification


Title	Omni-directional Feature Learning for Person Re-identification
Authors	Di Wu, Hong-Wei Yang, De-Shuang Huang
Abstract	Person re-identification (PReID) has received increasing attention due to it is an important part in intelligent surveillance. Recently, many state-of-the-art methods on PReID are part-based deep models. Most of them focus on learning the part feature representation of person body in horizontal direction. However, the feature representation of body in vertical direction is usually ignored. Besides, the spatial information between these part features and the different feature channels is not considered. In this study, we introduce a multi-branches deep model for PReID. Specifically, the model consists of five branches. Among the five branches, two of them learn the local feature with spatial information from horizontal or vertical orientations, respectively. The other one aims to learn interdependencies knowledge between different feature channels generated by the last convolution layer. The remains of two other branches are identification and triplet sub-networks, in which the discriminative global feature and a corresponding measurement can be learned simultaneously. All the five branches can improve the representation learning. We conduct extensive comparative experiments on three PReID benchmarks including CUHK03, Market-1501 and DukeMTMC-reID. The proposed deep framework outperforms many state-of-the-art in most cases.
Tasks	Person Re-Identification, Representation Learning
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05319v1
PDF	http://arxiv.org/pdf/1812.05319v1.pdf
PWC	https://paperswithcode.com/paper/omni-directional-feature-learning-for-person
Repo
Framework

Addressing the Item Cold-start Problem by Attribute-driven Active Learning


Title	Addressing the Item Cold-start Problem by Attribute-driven Active Learning
Authors	Yu Zhu, Jinhao Lin, Shibi He, Beidou Wang, Ziyu Guan, Haifeng Liu, Deng Cai
Abstract	In recommender systems, cold-start issues are situations where no previous events, e.g. ratings, are known for certain users or items. In this paper, we focus on the item cold-start problem. Both content information (e.g. item attributes) and initial user ratings are valuable for seizing users’ preferences on a new item. However, previous methods for the item cold-start problem either 1) incorporate content information into collaborative filtering to perform hybrid recommendation, or 2) actively select users to rate the new item without considering content information and then do collaborative filtering. In this paper, we propose a novel recommendation scheme for the item cold-start problem by leverage both active learning and items’ attribute information. Specifically, we design useful user selection criteria based on items’ attributes and users’ rating history, and combine the criteria in an optimization framework for selecting users. By exploiting the feedback ratings, users’ previous ratings and items’ attributes, we then generate accurate rating predictions for the other unselected users. Experimental results on two real-world datasets show the superiority of our proposed method over traditional methods.
Tasks	Active Learning, Recommendation Systems
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09023v1
PDF	http://arxiv.org/pdf/1805.09023v1.pdf
PWC	https://paperswithcode.com/paper/addressing-the-item-cold-start-problem-by
Repo
Framework

Accelerating Beam Sweeping in mmWave Standalone 5G New Radios using Recurrent Neural Networks


Title	Accelerating Beam Sweeping in mmWave Standalone 5G New Radios using Recurrent Neural Networks
Authors	Asim Mazin, Mohamed Elkourdi, Richard D. Gitlin
Abstract	Millimeter wave (mmWave) is a key technology to support high data rate demands for 5G applications. Highly directional transmissions are crucial at these frequencies to compensate for high isotropic pathloss. This reliance on di- rectional beamforming, however, makes the cell discovery (cell search) challenging since both base station (gNB) and user equipment (UE) jointly perform a search over angular space to locate potential beams to initiate communication. In the cell discovery phase, sequential beam sweeping is performed through the angular coverage region in order to transmit synchronization signals. The sweeping pattern can either be a linear rotation or a hopping pattern that makes use of additional information. This paper proposes beam sweeping pattern prediction, based on the dynamic distribution of user traffic, using a form of recurrent neural networks (RNNs) called Gated Recurrent Unit (GRU). The spatial distribution of users is inferred from data in call detail records (CDRs) of the cellular network. Results show that the users spatial distribution and their approximate location (direction) can be accurately predicted based on CDRs data using GRU, which is then used to calculate the sweeping pattern in the angular domain during cell search.
Tasks
Published	2018-09-04
URL	http://arxiv.org/abs/1809.01096v1
PDF	http://arxiv.org/pdf/1809.01096v1.pdf
PWC	https://paperswithcode.com/paper/accelerating-beam-sweeping-in-mmwave
Repo
Framework

Tournament Leave-pair-out Cross-validation for Receiver Operating Characteristic (ROC) Analysis


Title	Tournament Leave-pair-out Cross-validation for Receiver Operating Characteristic (ROC) Analysis
Authors	Ileana Montoya Perez, Antti Airola, Peter J. Boström, Ivan Jambor, Tapio Pahikkala
Abstract	Receiver operating characteristic (ROC) analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under ROC curve (AUC) with standard cross-validation methods suffers from a large bias. The leave-pair-out (LPO) cross-validation has been shown to correct this bias. However, while LPO produces an almost unbiased estimate of AUC, it does not provide a ranking of the data needed for plotting and analyzing the ROC curve. In this study, we propose a new method called tournament leave-pair-out (TLPO) cross-validation. This method extends LPO by creating a tournament from pair comparisons to produce a ranking for the data. TLPO preserves the advantage of LPO for estimating AUC, while it also allows performing ROC analysis. We have shown using both synthetic and real world data that TLPO is as reliable as LPO for AUC estimation and confirmed the bias in leave-one-out cross-validation on low-dimensional data.
Tasks
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09386v1
PDF	http://arxiv.org/pdf/1801.09386v1.pdf
PWC	https://paperswithcode.com/paper/tournament-leave-pair-out-cross-validation
Repo
Framework

Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices


Title	Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices
Authors	Ramyad Hadidi, Jiashen Cao, Matthew Woodward, Michael S. Ryoo, Hyesoon Kim
Abstract	The prevalence of Internet of things (IoT) devices and abundance of sensor data has created an increase in real-time data processing such as recognition of speech, image, and video. While currently such processes are offloaded to the computationally powerful cloud system, a localized and distributed approach is desirable because (i) it preserves the privacy of users and (ii) it omits the dependency on cloud services. However, IoT networks are usually composed of resource-constrained devices, and a single device is not powerful enough to process real-time data. To overcome this challenge, we examine data and model parallelism for such devices in the context of deep neural networks. We propose Musical Chair to enable efficient, localized, and dynamic real-time recognition by harvesting the aggregated computational power from the resource-constrained devices in the same IoT network as input sensors. Musical chair adapts to the availability of computing devices at runtime and adjusts to the inherit dynamics of IoT networks. To demonstrate Musical Chair, on a network of Raspberry PIs (up to 12) each connected to a camera, we implement a state-of-the-art action recognition model for videos and two recognition models for images. Compared to the Tegra TX2, an embedded low-power platform with a six-core CPU and a GPU, our distributed action recognition system achieves not only similar energy consumption but also twice the performance of the TX2. Furthermore, in image recognition, Musical Chair achieves similar performance and saves dynamic energy.
Tasks	Temporal Action Localization
Published	2018-02-05
URL	http://arxiv.org/abs/1802.02138v3
PDF	http://arxiv.org/pdf/1802.02138v3.pdf
PWC	https://paperswithcode.com/paper/musical-chair-efficient-real-time-recognition
Repo
Framework