October 17, 2019

3370 words 16 mins read

Paper Group ANR 943

Paper Group ANR 943

Image Super-Resolution Using VDSR-ResNeXt and SRCGAN. PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report. Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution. Denoising Dictionary Learning Against Adversarial Perturbations. Qualitätsmaße binärer Klassifikationen im Bereich kriminalprognostischer …

Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Title Image Super-Resolution Using VDSR-ResNeXt and SRCGAN
Authors Saifuddin Hitawala, Yao Li, Xian Wang, Dongyang Yang
Abstract Over the past decade, many Super Resolution techniques have been developed using deep learning. Among those, generative adversarial networks (GAN) and very deep convolutional networks (VDSR) have shown promising results in terms of HR image quality and computational speed. In this paper, we propose two approaches based on these two algorithms: VDSR-ResNeXt, which is a deep multi-branch convolutional network inspired by VDSR and ResNeXt; and SRCGAN, which is a conditional GAN that explicitly passes class labels as input to the GAN. The two methods were implemented on common SR benchmark datasets for both quantitative and qualitative assessment.
Tasks Image Super-Resolution, Super-Resolution
Published 2018-10-10
URL http://arxiv.org/abs/1810.05731v1
PDF http://arxiv.org/pdf/1810.05731v1.pdf
PWC https://paperswithcode.com/paper/image-super-resolution-using-vdsr-resnext-and
Repo
Framework

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Title PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report
Authors Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc Van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, Cheolkon Jung
Abstract This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with a DSLR camera. The target metric used in this challenge combined the runtime, PSNR scores and solutions’ perceptual results measured in the user study. To ensure the efficiency of the submitted models, we additionally measured their runtime and memory requirements on Android smartphones. The proposed solutions significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones.
Tasks Image Enhancement, Image Super-Resolution, Super-Resolution
Published 2018-10-03
URL http://arxiv.org/abs/1810.01641v1
PDF http://arxiv.org/pdf/1810.01641v1.pdf
PWC https://paperswithcode.com/paper/pirm-challenge-on-perceptual-image
Repo
Framework

Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution

Title Channel-wise and Spatial Feature Modulation Network for Single Image Super-Resolution
Authors Yanting Hu, Jie Li, Yuanfei Huang, Xinbo Gao
Abstract The performance of single image super-resolution has achieved significant improvement by utilizing deep convolutional neural networks (CNNs). The features in deep CNN contain different types of information which make different contributions to image reconstruction. However, most CNN-based models lack discriminative ability for different types of information and deal with them equally, which results in the representational capacity of the models being limited. On the other hand, as the depth of neural networks grows, the long-term information coming from preceding layers is easy to be weaken or lost in late layers, which is adverse to super-resolving image. To capture more informative features and maintain long-term information for image super-resolution, we propose a channel-wise and spatial feature modulation (CSFM) network in which a sequence of feature-modulation memory (FMM) modules is cascaded with a densely connected structure to transform low-resolution features to high informative features. In each FMM module, we construct a set of channel-wise and spatial attention residual (CSAR) blocks and stack them in a chain structure to dynamically modulate multi-level features in a global-and-local manner. This feature modulation strategy enables the high contribution information to be enhanced and the redundant information to be suppressed. Meanwhile, for long-term information persistence, a gated fusion (GF) node is attached at the end of the FMM module to adaptively fuse hierarchical features and distill more effective information via the dense skip connections and the gating mechanism. Extensive quantitative and qualitative evaluations on benchmark datasets illustrate the superiority of our proposed method over the state-of-the-art methods.
Tasks Image Reconstruction, Image Super-Resolution, Super-Resolution
Published 2018-09-28
URL http://arxiv.org/abs/1809.11130v1
PDF http://arxiv.org/pdf/1809.11130v1.pdf
PWC https://paperswithcode.com/paper/channel-wise-and-spatial-feature-modulation
Repo
Framework

Denoising Dictionary Learning Against Adversarial Perturbations

Title Denoising Dictionary Learning Against Adversarial Perturbations
Authors John Mitro, Derek Bridge, Steven Prestwich
Abstract We propose denoising dictionary learning (DDL), a simple yet effective technique as a protection measure against adversarial perturbations. We examined denoising dictionary learning on MNIST and CIFAR10 perturbed under two different perturbation techniques, fast gradient sign (FGSM) and jacobian saliency maps (JSMA). We evaluated it against five different deep neural networks (DNN) representing the building blocks of most recent architectures indicating a successive progression of model complexity of each other. We show that each model tends to capture different representations based on their architecture. For each model we recorded its accuracy both on the perturbed test data previously misclassified with high confidence and on the denoised one after the reconstruction using dictionary learning. The reconstruction quality of each data point is assessed by means of PSNR (Peak Signal to Noise Ratio) and Structure Similarity Index (SSI). We show that after applying (DDL) the reconstruction of the original data point from a noisy
Tasks Denoising, Dictionary Learning
Published 2018-01-07
URL http://arxiv.org/abs/1801.02257v1
PDF http://arxiv.org/pdf/1801.02257v1.pdf
PWC https://paperswithcode.com/paper/denoising-dictionary-learning-against
Repo
Framework

Qualitätsmaße binärer Klassifikationen im Bereich kriminalprognostischer Instrumente der vierten Generation

Title Qualitätsmaße binärer Klassifikationen im Bereich kriminalprognostischer Instrumente der vierten Generation
Authors Tobias D. Krafft
Abstract This master’s thesis discusses an important issue regarding how algorithmic decision making (ADM) is used in crime forecasting. In America forecasting tools are widely used by judiciary systems for making decisions about risk offenders based on criminal justice for risk offenders. By making use of such tools, the judiciary relies on ADM in order to make error free judgement on offenders. For this purpose, one of the quality measures for machine learning techniques which is widly used, the $AUC$ (area under curve), is compared to and contrasted for results with the $PPV_k$ (positive predictive value). Keeping in view the criticality of judgement along with a high dependency on tools offering ADM, it is necessary to evaluate risk tools that aid in decision making based on algorithms. In this methodology, such an evaluation is conducted by implementing a common machine learning approach called binary classifier, as it determines the binary outcome of the underlying juristic question. This thesis showed that the $PPV_k$ (positive predictive value) technique models the decision of judges much better than the $AUC$. Therefore, this research has investigated whether there exists a classifier for which the $PPV_k$ deviates from $AUC$ by a large proportion. It could be shown that the deviation can rise up to 0.75. In order to test this deviation on an already in used Classifier, data from the fourth generation risk assement tool COMPAS was used. The result were were quite alarming as the two measures derivate from each other by 0.48. In this study, the risk assessment evaluation of the forecasting tools was successfully conducted, carefully reviewed and examined. Additionally, it is also discussed whether such systems used for the purpose of making decisions should be socially accepted or not.
Tasks Decision Making
Published 2018-04-04
URL http://arxiv.org/abs/1804.01557v1
PDF http://arxiv.org/pdf/1804.01557v1.pdf
PWC https://paperswithcode.com/paper/qualitatsmae-binarer-klassifikationen-im
Repo
Framework

Image Registration Based Flicker Solving in Video Face Replacement and Analysis Based Sub-pixel Image Registration

Title Image Registration Based Flicker Solving in Video Face Replacement and Analysis Based Sub-pixel Image Registration
Authors Xiaofang Wang, Guoqiang Xiang, Xinyue Zhang, Wei Wei
Abstract In this paper, a framework of video face replacement is proposed and it deals with the flicker of swapped face in video sequence. This framework contains two main innovations: 1) the technique of image registration is exploited to align the source and target video faces for eliminating the flicker or jitter of the segmented video face sequence; 2) a fast subpixel image registration method is proposed for farther accuracy and efficiency. Unlike the priori works, it minimizes the overlapping region and takes spatiotemporal coherence into account. Flicker in resulted videos is usually caused by the frequently changed bound of the blending target face and unregistered faces between and along video sequences. The subpixel image registration method is proposed to solve the flicker problem. During the alignment process, integer pixel registration is formulated by maximizing the similarity of images with down sampling strategy speeding up the process and sub-pixel image registration is a single-step image match via analytic method. Experimental results show the proposed algorithm reduces the computation time and gets a high accuracy when conducting experiments on different data sets.
Tasks Image Registration
Published 2018-03-09
URL http://arxiv.org/abs/1803.05851v1
PDF http://arxiv.org/pdf/1803.05851v1.pdf
PWC https://paperswithcode.com/paper/image-registration-based-flicker-solving-in
Repo
Framework

Neural Allocentric Intuitive Physics Prediction from Real Videos

Title Neural Allocentric Intuitive Physics Prediction from Real Videos
Authors Zhihua Wang, Stefano Rosa, Yishu Miao, Zihang Lai, Linhai Xie, Andrew Markham, Niki Trigoni
Abstract Humans are able to make rich predictions about the future dynamics of physical objects from a glance. On the other hand, most existing computer vision approaches require strong assumptions about the underlying system, ad-hoc modeling, or annotated datasets, to carry out even simple predictions. To tackle this gap, we propose a new perspective on the problem of learning intuitive physics that is inspired by the spatial memory representation of objects and spaces in human brains, in particular the co-existence of egocentric and allocentric spatial representations. We present a generic framework that learns a layered representation of the physical world, using a cascade of invertible modules. In this framework, real images are first converted to a synthetic domain representation that reduces complexity arising from lighting and texture. Then, an allocentric viewpoint transformer removes viewpoint complexity by projecting images to a canonical view. Finally, a novel Recurrent Latent Variation Network (RLVN) architecture learns the dynamics of the objects interacting with the environment and predicts future motion, leveraging the availability of unlimited synthetic simulations. Predicted frames are then projected back to the original camera view and translated back to the real world domain. Experimental results show the ability of the framework to consistently and accurately predict several frames in the future and the ability to adapt to real images.
Tasks
Published 2018-09-07
URL http://arxiv.org/abs/1809.03330v2
PDF http://arxiv.org/pdf/1809.03330v2.pdf
PWC https://paperswithcode.com/paper/neural-allocentric-intuitive-physics
Repo
Framework

Stochastic Layer-Wise Precision in Deep Neural Networks

Title Stochastic Layer-Wise Precision in Deep Neural Networks
Authors Griffin Lacey, Graham W. Taylor, Shawki Areibi
Abstract Low precision weights, activations, and gradients have been proposed as a way to improve the computational efficiency and memory footprint of deep neural networks. Recently, low precision networks have even shown to be more robust to adversarial attacks. However, typical implementations of low precision DNNs use uniform precision across all layers of the network. In this work, we explore whether a heterogeneous allocation of precision across a network leads to improved performance, and introduce a learning scheme where a DNN stochastically explores multiple precision configurations through learning. This permits a network to learn an optimal precision configuration. We show on convolutional neural networks trained on MNIST and ILSVRC12 that even though these nets learn a uniform or near-uniform allocation strategy respectively, stochastic precision leads to a favourable regularization effect improving generalization.
Tasks
Published 2018-07-03
URL http://arxiv.org/abs/1807.00942v1
PDF http://arxiv.org/pdf/1807.00942v1.pdf
PWC https://paperswithcode.com/paper/stochastic-layer-wise-precision-in-deep
Repo
Framework

Improving the Annotation of DeepFashion Images for Fine-grained Attribute Recognition

Title Improving the Annotation of DeepFashion Images for Fine-grained Attribute Recognition
Authors Roshanak Zakizadeh, Michele Sasdelli, Yu Qian, Eduard Vazquez
Abstract DeepFashion is a widely used clothing dataset with 50 categories and more than overall 200k images where each image is annotated with fine-grained attributes. This dataset is often used for clothes recognition and although it provides comprehensive annotations, the attributes distribution is unbalanced and repetitive specially for training fine-grained attribute recognition models. In this work, we tailored DeepFashion for fine-grained attribute recognition task by focusing on each category separately. After selecting categories with sufficient number of images for training, we remove very scarce attributes and merge the duplicate ones in each category, then we clean the dataset based on the new list of attributes. We use a bilinear convolutional neural network with pairwise ranking loss function for multi-label fine-grained attribute recognition and show that the new annotations improve the results for such a task. The detailed annotations for each of the selected categories are provided for public use.
Tasks
Published 2018-07-31
URL http://arxiv.org/abs/1807.11674v1
PDF http://arxiv.org/pdf/1807.11674v1.pdf
PWC https://paperswithcode.com/paper/improving-the-annotation-of-deepfashion
Repo
Framework

Two Dimensional Stochastic Configuration Networks for Image Data Analytics

Title Two Dimensional Stochastic Configuration Networks for Image Data Analytics
Authors Ming Li, Dianhui Wang
Abstract Stochastic configuration networks (SCNs) as a class of randomized learner model have been successfully employed in data analytics due to its universal approximation capability and fast modelling property. The technical essence lies in stochastically configuring hidden nodes (or basis functions) based on a supervisory mechanism rather than data-independent randomization as usually adopted for building randomized neural networks. Given image data modelling tasks, the use of one-dimensional SCNs potentially demolishes the spatial information of images, and may result in undesirable performance. This paper extends the original SCNs to two-dimensional version, termed 2DSCNs, for fast building randomized learners with matrix-inputs. Some theoretical analyses on the goodness of 2DSCNs against SCNs, including the complexity of the random parameter space, and the superiority of generalization, are presented. Empirical results over one regression, four benchmark handwritten digits classification, and two human face recognition datasets demonstrate that the proposed 2DSCNs perform favourably and show good potential for image data analytics.
Tasks Face Recognition
Published 2018-09-06
URL http://arxiv.org/abs/1809.02066v1
PDF http://arxiv.org/pdf/1809.02066v1.pdf
PWC https://paperswithcode.com/paper/two-dimensional-stochastic-configuration
Repo
Framework

Sampled in Pairs and Driven by Text: A New Graph Embedding Framework

Title Sampled in Pairs and Driven by Text: A New Graph Embedding Framework
Authors Liheng Chen, Yanru Qu, Zhenghui Wang, Lin Qiu, Weinan Zhang, Ken Chen, Shaodian Zhang, Yong Yu
Abstract In graphs with rich texts, incorporating textual information with structural information would benefit constructing expressive graph embeddings. Among various graph embedding models, random walk (RW)-based is one of the most popular and successful groups. However, it is challenged by two issues when applied on graphs with rich texts: (i) sampling efficiency: deriving from the training objective of RW-based models (e.g., DeepWalk and node2vec), we show that RW-based models are likely to generate large amounts of redundant training samples due to three main drawbacks. (ii) text utilization: these models have difficulty in dealing with zero-shot scenarios where graph embedding models have to infer graph structures directly from texts. To solve these problems, we propose a novel framework, namely Text-driven Graph Embedding with Pairs Sampling (TGE-PS). TGE-PS uses Pairs Sampling (PS) to improve the sampling strategy of RW, being able to reduce ~99% training samples while preserving competitive performance. TGE-PS uses Text-driven Graph Embedding (TGE), an inductive graph embedding approach, to generate node embeddings from texts. Since each node contains rich texts, TGE is able to generate high-quality embeddings and provide reasonable predictions on existence of links to unseen nodes. We evaluate TGE-PS on several real-world datasets, and experiment results demonstrate that TGE-PS produces state-of-the-art results on both traditional and zero-shot link prediction tasks.
Tasks Graph Embedding, Link Prediction
Published 2018-09-12
URL https://arxiv.org/abs/1809.04234v2
PDF https://arxiv.org/pdf/1809.04234v2.pdf
PWC https://paperswithcode.com/paper/tge-ps-text-driven-graph-embedding-with-pairs
Repo
Framework

Feature Learning for Meta-Paths in Knowledge Graphs

Title Feature Learning for Meta-Paths in Knowledge Graphs
Authors Sebastian Bischoff
Abstract In this thesis, we study the problem of feature learning on heterogeneous knowledge graphs. These features can be used to perform tasks such as link prediction, classification and clustering on graphs. Knowledge graphs provide rich semantics encoded in the edge and node types. Meta-paths consist of these types and abstract paths in the graph. Until now, meta-paths can only be used as categorical features with high redundancy and are therefore unsuitable for machine learning models. We propose meta-path embeddings to solve this problem by learning semantical and compact vector representations of them. Current graph embedding methods only embed nodes and edge types and therefore miss semantics encoded in the combination of them. Our method embeds meta-paths using the skipgram model with an extension to deal with the redundancy and high amount of meta-paths in big knowledge graphs. We critically evaluate our embedding approach by predicting links on Wikidata. The experiments indicate that we learn a sensible embedding of the meta-paths but can improve it further.
Tasks Graph Embedding, Knowledge Graphs, Link Prediction
Published 2018-09-07
URL http://arxiv.org/abs/1809.03267v1
PDF http://arxiv.org/pdf/1809.03267v1.pdf
PWC https://paperswithcode.com/paper/feature-learning-for-meta-paths-in-knowledge
Repo
Framework

Graphene: A Context-Preserving Open Information Extraction System

Title Graphene: A Context-Preserving Open Information Extraction System
Authors Matthias Cetto, Christina Niklaus, André Freitas, Siegfried Handschuh
Abstract We introduce Graphene, an Open IE system whose goal is to generate accurate, meaningful and complete propositions that may facilitate a variety of downstream semantic applications. For this purpose, we transform syntactically complex input sentences into clean, compact structures in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them in order to maintain their semantic relationship. In that way, we preserve the context of the relational tuples extracted from a source sentence, generating a novel lightweight semantic representation for Open IE that enhances the expressiveness of the extracted propositions.
Tasks Open Information Extraction
Published 2018-08-28
URL http://arxiv.org/abs/1808.09463v1
PDF http://arxiv.org/pdf/1808.09463v1.pdf
PWC https://paperswithcode.com/paper/graphene-a-context-preserving-open
Repo
Framework

Unsupervised Feature Learning Toward a Real-time Vehicle Make and Model Recognition

Title Unsupervised Feature Learning Toward a Real-time Vehicle Make and Model Recognition
Authors Amir Nazemi, Mohammad Javad Shafiee, Zohreh Azimifar, Alexander Wong
Abstract Vehicle Make and Model Recognition (MMR) systems provide a fully automatic framework to recognize and classify different vehicle models. Several approaches have been proposed to address this challenge, however they can perform in restricted conditions. Here, we formulate the vehicle make and model recognition as a fine-grained classification problem and propose a new configurable on-road vehicle make and model recognition framework. We benefit from the unsupervised feature learning methods and in more details we employ Locality constraint Linear Coding (LLC) method as a fast feature encoder for encoding the input SIFT features. The proposed method can perform in real environments of different conditions. This framework can recognize fifty models of vehicles and has an advantage to classify every other vehicle not belonging to one of the specified fifty classes as an unknown vehicle. The proposed MMR framework can be configured to become faster or more accurate based on the application domain. The proposed approach is examined on two datasets including Iranian on-road vehicle dataset and CompuCar dataset. The Iranian on-road vehicle dataset contains images of 50 models of vehicles captured in real situations by traffic cameras in different weather and lighting conditions. Experimental results show superiority of the proposed framework over the state-of-the-art methods on Iranian on-road vehicle datatset and comparable results on CompuCar dataset with 97.5% and 98.4% accuracies, respectively.
Tasks
Published 2018-06-08
URL http://arxiv.org/abs/1806.03028v1
PDF http://arxiv.org/pdf/1806.03028v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-feature-learning-toward-a-real
Repo
Framework

Scalable Logo Recognition using Proxies

Title Scalable Logo Recognition using Proxies
Authors Istvan Fehervari, Srikar Appalaraju
Abstract Logo recognition is the task of identifying and classifying logos. Logo recognition is a challenging problem as there is no clear definition of a logo and there are huge variations of logos, brands and re-training to cover every variation is impractical. In this paper, we formulate logo recognition as a few-shot object detection problem. The two main components in our pipeline are universal logo detector and few-shot logo recognizer. The universal logo detector is a class-agnostic deep object detector network which tries to learn the characteristics of what makes a logo. It predicts bounding boxes on likely logo regions. These logo regions are then classified by logo recognizer using nearest neighbor search, trained by triplet loss using proxies. We also annotated a first of its kind product logo dataset containing 2000 logos from 295K images collected from Amazon called PL2K. Our pipeline achieves 97% recall with 0.6 mAP on PL2K test dataset and state-of-the-art 0.565 mAP on the publicly available FlickrLogos-32 test set without fine-tuning.
Tasks Few-Shot Object Detection, Logo Recognition, Object Detection
Published 2018-11-19
URL http://arxiv.org/abs/1811.08009v1
PDF http://arxiv.org/pdf/1811.08009v1.pdf
PWC https://paperswithcode.com/paper/scalable-logo-recognition-using-proxies
Repo
Framework
comments powered by Disqus