July 29, 2019

3036 words 15 mins read

Paper Group AWR 149

Correlating Satellite Cloud Cover with Sky Cameras. Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. Optimizing the Latent Space of Generative Networks. VITON: An Image-based Virtual Try-on Network. A Generative Model for Vo …

Correlating Satellite Cloud Cover with Sky Cameras


Title	Correlating Satellite Cloud Cover with Sky Cameras
Authors	Shilpa Manandhar, Soumyabrata Dev, Yee Hui Lee, Yu Song Meng
Abstract	The role of clouds is manifold in understanding the various events in the atmosphere, and also in studying the radiative balance of the earth. The conventional manner of such cloud analysis is performed mainly via satellite images. However, because of its low temporal- and spatial- resolutions, ground-based sky cameras are now getting popular. In this paper, we study the relation between the cloud cover obtained from MODIS images, with the coverage obtained from ground-based sky cameras. This will help us to better understand cloud formation in the atmosphere - both from satellite images and ground-based observations.
Tasks
Published	2017-08-24
URL	http://arxiv.org/abs/1709.05283v1
PDF	http://arxiv.org/pdf/1709.05283v1.pdf
PWC	https://paperswithcode.com/paper/correlating-satellite-cloud-cover-with-sky
Repo	https://github.com/Soumyabrata/MODIS-cloud-mask
Framework	none

Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones


Title	Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones
Authors	Mohamed Abudulaziz Ali Haseeb, Ramviyas Parasuraman
Abstract	This paper introduces Wisture, a new online machine learning solution for recognizing touch-less dynamic hand gestures on a smartphone. Wisture relies on the standard Wi-Fi Received Signal Strength (RSS) using a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN), thresholding filters and traffic induction. Unlike other Wi-Fi based gesture recognition methods, the proposed method does not require a modification of the smartphone hardware or the operating system, and performs the gesture recognition without interfering with the normal operation of other smartphone applications. We discuss the characteristics of Wisture, and conduct extensive experiments to compare its performance against state-of-the-art machine learning solutions in terms of both accuracy and time efficiency. The experiments include a set of different scenarios in terms of both spatial setup and traffic between the smartphone and Wi-Fi access points (AP). The results show that Wisture achieves an online recognition accuracy of up to 94% (average 78%) in detecting and classifying three hand gestures.
Tasks	Gesture Recognition
Published	2017-07-26
URL	http://arxiv.org/abs/1707.08569v2
PDF	http://arxiv.org/pdf/1707.08569v2.pdf
PWC	https://paperswithcode.com/paper/wisture-rnn-based-learning-of-wireless
Repo	https://github.com/mohaseeb/wisture
Framework	none

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization


Title	Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Authors	Xun Huang, Serge Belongie
Abstract	Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network.
Tasks	Style Transfer
Published	2017-03-20
URL	http://arxiv.org/abs/1703.06868v2
PDF	http://arxiv.org/pdf/1703.06868v2.pdf
PWC	https://paperswithcode.com/paper/arbitrary-style-transfer-in-real-time-with
Repo	https://github.com/nhatsmrt/torch-styletransfer
Framework	torch

Optimizing the Latent Space of Generative Networks


Title	Optimizing the Latent Space of Generative Networks
Authors	Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam
Abstract	Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep convolutional neural networks. The goal of this paper is to disentangle the contribution of these two factors to the success of GANs. In particular, we introduce Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses. Throughout a variety of experiments, we show that GLO enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme.
Tasks
Published	2017-07-18
URL	https://arxiv.org/abs/1707.05776v2
PDF	https://arxiv.org/pdf/1707.05776v2.pdf
PWC	https://paperswithcode.com/paper/optimizing-the-latent-space-of-generative
Repo	https://github.com/yedidh/glann
Framework	pytorch

VITON: An Image-based Virtual Try-on Network


Title	VITON: An Image-based Virtual Try-on Network
Authors	Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis
Abstract	We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy. Conditioned upon a new clothing-agnostic yet descriptive person representation, our framework first generates a coarse synthesized image with the target clothing item overlaid on that same person in the same pose. We further enhance the initial blurry clothing area with a refinement network. The network is trained to learn how much detail to utilize from the target clothing item, and where to apply to the person in order to synthesize a photo-realistic image in which the target item deforms naturally with clear visual patterns. Experiments on our newly collected Zalando dataset demonstrate its promise in the image-based virtual try-on task over state-of-the-art generative models.
Tasks
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08447v4
PDF	http://arxiv.org/pdf/1711.08447v4.pdf
PWC	https://paperswithcode.com/paper/viton-an-image-based-virtual-try-on-network
Repo	https://github.com/xthan/VITON
Framework	tf

A Generative Model for Volume Rendering


Title	A Generative Model for Volume Rendering
Authors	Matthew Berger, Jixian Li, Joshua A. Levine
Abstract	We present a technique to synthesize and analyze volume-rendered images using generative models. We use the Generative Adversarial Network (GAN) framework to compute a model from a large collection of volume renderings, conditioned on (1) viewpoint and (2) transfer functions for opacity and color. Our approach facilitates tasks for volume analysis that are challenging to achieve using existing rendering techniques such as ray casting or texture-based methods. We show how to guide the user in transfer function editing by quantifying expected change in the output image. Additionally, the generative model transforms transfer functions into a view-invariant latent space specifically designed to synthesize volume-rendered images. We use this space directly for rendering, enabling the user to explore the space of volume-rendered images. As our model is independent of the choice of volume rendering process, we show how to analyze volume-rendered images produced by direct and global illumination lighting, for a variety of volume datasets.
Tasks
Published	2017-10-26
URL	https://arxiv.org/abs/1710.09545v2
PDF	https://arxiv.org/pdf/1710.09545v2.pdf
PWC	https://paperswithcode.com/paper/a-generative-model-for-volume-rendering
Repo	https://github.com/matthewberger/tfgan
Framework	pytorch

Fast k-Nearest Neighbour Search via Prioritized DCI


Title	Fast k-Nearest Neighbour Search via Prioritized DCI
Authors	Ke Li, Jitendra Malik
Abstract	Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to Locality-Sensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00440v2
PDF	http://arxiv.org/pdf/1703.00440v2.pdf
PWC	https://paperswithcode.com/paper/fast-k-nearest-neighbour-search-via
Repo	https://github.com/dnbaker/frp
Framework	none

Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks


Title	Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks
Authors	Hye-Rin Kim, Yeong-Seok Kim, Seon Joo Kim, In-Kwon Lee
Abstract	An image is a very effective tool for conveying emotions. Many researchers have investigated in computing the image emotions by using various features extracted from images. In this paper, we focus on two high level features, the object and the background, and assume that the semantic information of images is a good cue for predicting emotion. An object is one of the most important elements that define an image, and we find out through experiments that there is a high correlation between the object and the emotion in images. Even with the same object, there may be slight difference in emotion due to different backgrounds, and we use the semantic information of the background to improve the prediction performance. By combining the different levels of features, we build an emotion based feed forward deep neural network which produces the emotion values of a given image. The output emotion values in our framework are continuous values in the 2-dimensional space (Valence and Arousal), which are more effective than using a few number of emotion categories in describing emotions. Experiments confirm the effectiveness of our network in predicting the emotion of images.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07543v2
PDF	http://arxiv.org/pdf/1705.07543v2.pdf
PWC	https://paperswithcode.com/paper/building-emotional-machines-recognizing-image
Repo	https://github.com/pohlinwei/AComPianist
Framework	tf

Depthwise Separable Convolutions for Neural Machine Translation


Title	Depthwise Separable Convolutions for Neural Machine Translation
Authors	Lukasz Kaiser, Aidan N. Gomez, Francois Chollet
Abstract	Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves new state-of-the-art results. In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new “super-separable” convolution operation that further reduces the number of parameters and computational cost for obtaining state-of-the-art results.
Tasks	Machine Translation
Published	2017-06-09
URL	http://arxiv.org/abs/1706.03059v2
PDF	http://arxiv.org/pdf/1706.03059v2.pdf
PWC	https://paperswithcode.com/paper/depthwise-separable-convolutions-for-neural
Repo	https://github.com/tensorflow/tensor2tensor
Framework	tf

Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image Retrieval


Title	Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image Retrieval
Authors	Jian Xu, Cunzhao Shi, Chengzuo Qi, Chunheng Wang, Baihua Xiao
Abstract	In this paper, we propose a simple but effective semantic part-based weighting aggregation (PWA) for image retrieval. The proposed PWA utilizes the discriminative filters of deep convolutional layers as part detectors. Moreover, we propose the effective unsupervised strategy to select some part detectors to generate the “probabilistic proposals”, which highlight certain discriminative parts of objects and suppress the noise of background. The final global PWA representation could then be acquired by aggregating the regional representations weighted by the selected “probabilistic proposals” corresponding to various semantic content. We conduct comprehensive experiments on four standard datasets and show that our unsupervised PWA outperforms the state-of-the-art unsupervised and supervised aggregation methods. Code is available at https://github.com/XJhaoren/PWA.
Tasks	Image Retrieval
Published	2017-05-03
URL	http://arxiv.org/abs/1705.01247v3
PDF	http://arxiv.org/pdf/1705.01247v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-part-based-weighting-aggregation
Repo	https://github.com/XJhaoren/PWA
Framework	none

Semantic Entity Retrieval Toolkit


Title	Semantic Entity Retrieval Toolkit
Authors	Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas
Abstract	Unsupervised learning of low-dimensional, semantic representations of words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our previously published entity representation models. The toolkit provides a unified interface to different representation learning algorithms, fine-grained parsing configuration and can be used transparently with GPUs. In addition, users can easily modify existing models or implement their own models in the framework. After model training, SERT can be used to rank entities according to a textual query and extract the learned entity/word representation for use in downstream algorithms, such as clustering or recommendation.
Tasks	Representation Learning
Published	2017-06-12
URL	http://arxiv.org/abs/1706.03757v2
PDF	http://arxiv.org/pdf/1706.03757v2.pdf
PWC	https://paperswithcode.com/paper/semantic-entity-retrieval-toolkit
Repo	https://github.com/cvangysel/SERT
Framework	none

Towards Diverse and Natural Image Descriptions via a Conditional GAN


Title	Towards Diverse and Natural Image Descriptions via a Conditional GAN
Authors	Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin
Abstract	Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the “ground-truth” captions while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity – two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks.
Tasks	Image Captioning
Published	2017-03-17
URL	http://arxiv.org/abs/1703.06029v3
PDF	http://arxiv.org/pdf/1703.06029v3.pdf
PWC	https://paperswithcode.com/paper/towards-diverse-and-natural-image
Repo	https://github.com/doubledaibo/gancaption_iccv2017
Framework	none

Context-aware Captions from Context-agnostic Supervision


Title	Context-aware Captions from Context-agnostic Supervision
Authors	Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik
Abstract	We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of “siamese cat” and “tiger cat”, we generate language that describes the “siamese cat” in a way that distinguishes it from “tiger cat”. Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination.
Tasks	Image Captioning, Language Modelling
Published	2017-01-11
URL	http://arxiv.org/abs/1701.02870v3
PDF	http://arxiv.org/pdf/1701.02870v3.pdf
PWC	https://paperswithcode.com/paper/context-aware-captions-from-context-agnostic
Repo	https://github.com/ruotianluo/DiscCaptioning
Framework	pytorch

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition


Title	Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Authors	Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang
Abstract	Motion representation plays a vital role in human action recognition in videos. In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow. The derivation also provides theoretical support for using the difference between two frames. By directly calculating pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. It enables the CNN to extract spatiotemporal information, especially the temporal information between frames simultaneously. This simple but powerful idea is validated by experimental results. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed. Experimental results also show that OFF is complementary to other motion modalities such as optical flow. When the proposed method is plugged into the state-of-the-art video action recognition framework, it has 96:0% and 74:2% accuracy on UCF-101 and HMDB-51 respectively. The code for this project is available at https://github.com/kevin-ssy/Optical-Flow-Guided-Feature.
Tasks	Action Recognition In Videos, Optical Flow Estimation, Temporal Action Localization
Published	2017-11-29
URL	http://arxiv.org/abs/1711.11152v2
PDF	http://arxiv.org/pdf/1711.11152v2.pdf
PWC	https://paperswithcode.com/paper/optical-flow-guided-feature-a-fast-and-robust
Repo	https://github.com/kevin-ssy/Optical-Flow-Guided-Feature
Framework	none

End-to-end weakly-supervised semantic alignment


Title	End-to-end weakly-supervised semantic alignment
Authors	Ignacio Rocco, Relja Arandjelović, Josef Sivic
Abstract	We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semantic alignment that is trainable in an end-to-end manner from weak image-level supervision in the form of matching image pairs. The outcome is that parameters are learnt from rich appearance variation present in different but semantically related images without the need for tedious manual annotation of correspondences at training time. Second, the main component of this architecture is a differentiable soft inlier scoring module, inspired by the RANSAC inlier scoring procedure, that computes the quality of the alignment based on only geometrically consistent correspondences thereby reducing the effect of background clutter. Third, we demonstrate that the proposed approach achieves state-of-the-art performance on multiple standard benchmarks for semantic alignment.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06861v2
PDF	http://arxiv.org/pdf/1712.06861v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-weakly-supervised-semantic
Repo	https://github.com/ignacio-rocco/weakalign
Framework	pytorch