Paper Group AWR 149
Correlating Satellite Cloud Cover with Sky Cameras. Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. Optimizing the Latent Space of Generative Networks. VITON: An Image-based Virtual Try-on Network. A Generative Model for Vo …
Correlating Satellite Cloud Cover with Sky Cameras
Title | Correlating Satellite Cloud Cover with Sky Cameras |
Authors | Shilpa Manandhar, Soumyabrata Dev, Yee Hui Lee, Yu Song Meng |
Abstract | The role of clouds is manifold in understanding the various events in the atmosphere, and also in studying the radiative balance of the earth. The conventional manner of such cloud analysis is performed mainly via satellite images. However, because of its low temporal- and spatial- resolutions, ground-based sky cameras are now getting popular. In this paper, we study the relation between the cloud cover obtained from MODIS images, with the coverage obtained from ground-based sky cameras. This will help us to better understand cloud formation in the atmosphere - both from satellite images and ground-based observations. |
Tasks | |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1709.05283v1 |
http://arxiv.org/pdf/1709.05283v1.pdf | |
PWC | https://paperswithcode.com/paper/correlating-satellite-cloud-cover-with-sky |
Repo | https://github.com/Soumyabrata/MODIS-cloud-mask |
Framework | none |
Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones
Title | Wisture: RNN-based Learning of Wireless Signals for Gesture Recognition in Unmodified Smartphones |
Authors | Mohamed Abudulaziz Ali Haseeb, Ramviyas Parasuraman |
Abstract | This paper introduces Wisture, a new online machine learning solution for recognizing touch-less dynamic hand gestures on a smartphone. Wisture relies on the standard Wi-Fi Received Signal Strength (RSS) using a Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN), thresholding filters and traffic induction. Unlike other Wi-Fi based gesture recognition methods, the proposed method does not require a modification of the smartphone hardware or the operating system, and performs the gesture recognition without interfering with the normal operation of other smartphone applications. We discuss the characteristics of Wisture, and conduct extensive experiments to compare its performance against state-of-the-art machine learning solutions in terms of both accuracy and time efficiency. The experiments include a set of different scenarios in terms of both spatial setup and traffic between the smartphone and Wi-Fi access points (AP). The results show that Wisture achieves an online recognition accuracy of up to 94% (average 78%) in detecting and classifying three hand gestures. |
Tasks | Gesture Recognition |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08569v2 |
http://arxiv.org/pdf/1707.08569v2.pdf | |
PWC | https://paperswithcode.com/paper/wisture-rnn-based-learning-of-wireless |
Repo | https://github.com/mohaseeb/wisture |
Framework | none |
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization
Title | Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization |
Authors | Xun Huang, Serge Belongie |
Abstract | Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer. However, their framework requires a slow iterative optimization process, which limits its practical application. Fast approximations with feed-forward neural networks have been proposed to speed up neural style transfer. Unfortunately, the speed improvement comes at a cost: the network is usually tied to a fixed set of styles and cannot adapt to arbitrary new styles. In this paper, we present a simple yet effective approach that for the first time enables arbitrary style transfer in real-time. At the heart of our method is a novel adaptive instance normalization (AdaIN) layer that aligns the mean and variance of the content features with those of the style features. Our method achieves speed comparable to the fastest existing approach, without the restriction to a pre-defined set of styles. In addition, our approach allows flexible user controls such as content-style trade-off, style interpolation, color & spatial controls, all using a single feed-forward neural network. |
Tasks | Style Transfer |
Published | 2017-03-20 |
URL | http://arxiv.org/abs/1703.06868v2 |
http://arxiv.org/pdf/1703.06868v2.pdf | |
PWC | https://paperswithcode.com/paper/arbitrary-style-transfer-in-real-time-with |
Repo | https://github.com/nhatsmrt/torch-styletransfer |
Framework | torch |
Optimizing the Latent Space of Generative Networks
Title | Optimizing the Latent Space of Generative Networks |
Authors | Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam |
Abstract | Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep convolutional neural networks. The goal of this paper is to disentangle the contribution of these two factors to the success of GANs. In particular, we introduce Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses. Throughout a variety of experiments, we show that GLO enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme. |
Tasks | |
Published | 2017-07-18 |
URL | https://arxiv.org/abs/1707.05776v2 |
https://arxiv.org/pdf/1707.05776v2.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-the-latent-space-of-generative |
Repo | https://github.com/yedidh/glann |
Framework | pytorch |
VITON: An Image-based Virtual Try-on Network
Title | VITON: An Image-based Virtual Try-on Network |
Authors | Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis |
Abstract | We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy. Conditioned upon a new clothing-agnostic yet descriptive person representation, our framework first generates a coarse synthesized image with the target clothing item overlaid on that same person in the same pose. We further enhance the initial blurry clothing area with a refinement network. The network is trained to learn how much detail to utilize from the target clothing item, and where to apply to the person in order to synthesize a photo-realistic image in which the target item deforms naturally with clear visual patterns. Experiments on our newly collected Zalando dataset demonstrate its promise in the image-based virtual try-on task over state-of-the-art generative models. |
Tasks | |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08447v4 |
http://arxiv.org/pdf/1711.08447v4.pdf | |
PWC | https://paperswithcode.com/paper/viton-an-image-based-virtual-try-on-network |
Repo | https://github.com/xthan/VITON |
Framework | tf |
A Generative Model for Volume Rendering
Title | A Generative Model for Volume Rendering |
Authors | Matthew Berger, Jixian Li, Joshua A. Levine |
Abstract | We present a technique to synthesize and analyze volume-rendered images using generative models. We use the Generative Adversarial Network (GAN) framework to compute a model from a large collection of volume renderings, conditioned on (1) viewpoint and (2) transfer functions for opacity and color. Our approach facilitates tasks for volume analysis that are challenging to achieve using existing rendering techniques such as ray casting or texture-based methods. We show how to guide the user in transfer function editing by quantifying expected change in the output image. Additionally, the generative model transforms transfer functions into a view-invariant latent space specifically designed to synthesize volume-rendered images. We use this space directly for rendering, enabling the user to explore the space of volume-rendered images. As our model is independent of the choice of volume rendering process, we show how to analyze volume-rendered images produced by direct and global illumination lighting, for a variety of volume datasets. |
Tasks | |
Published | 2017-10-26 |
URL | https://arxiv.org/abs/1710.09545v2 |
https://arxiv.org/pdf/1710.09545v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generative-model-for-volume-rendering |
Repo | https://github.com/matthewberger/tfgan |
Framework | pytorch |
Fast k-Nearest Neighbour Search via Prioritized DCI
Title | Fast k-Nearest Neighbour Search via Prioritized DCI |
Authors | Ke Li, Jitendra Malik |
Abstract | Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to Locality-Sensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21. |
Tasks | |
Published | 2017-03-01 |
URL | http://arxiv.org/abs/1703.00440v2 |
http://arxiv.org/pdf/1703.00440v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-k-nearest-neighbour-search-via |
Repo | https://github.com/dnbaker/frp |
Framework | none |
Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks
Title | Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks |
Authors | Hye-Rin Kim, Yeong-Seok Kim, Seon Joo Kim, In-Kwon Lee |
Abstract | An image is a very effective tool for conveying emotions. Many researchers have investigated in computing the image emotions by using various features extracted from images. In this paper, we focus on two high level features, the object and the background, and assume that the semantic information of images is a good cue for predicting emotion. An object is one of the most important elements that define an image, and we find out through experiments that there is a high correlation between the object and the emotion in images. Even with the same object, there may be slight difference in emotion due to different backgrounds, and we use the semantic information of the background to improve the prediction performance. By combining the different levels of features, we build an emotion based feed forward deep neural network which produces the emotion values of a given image. The output emotion values in our framework are continuous values in the 2-dimensional space (Valence and Arousal), which are more effective than using a few number of emotion categories in describing emotions. Experiments confirm the effectiveness of our network in predicting the emotion of images. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07543v2 |
http://arxiv.org/pdf/1705.07543v2.pdf | |
PWC | https://paperswithcode.com/paper/building-emotional-machines-recognizing-image |
Repo | https://github.com/pohlinwei/AComPianist |
Framework | tf |
Depthwise Separable Convolutions for Neural Machine Translation
Title | Depthwise Separable Convolutions for Neural Machine Translation |
Authors | Lukasz Kaiser, Aidan N. Gomez, Francois Chollet |
Abstract | Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves new state-of-the-art results. In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new “super-separable” convolution operation that further reduces the number of parameters and computational cost for obtaining state-of-the-art results. |
Tasks | Machine Translation |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.03059v2 |
http://arxiv.org/pdf/1706.03059v2.pdf | |
PWC | https://paperswithcode.com/paper/depthwise-separable-convolutions-for-neural |
Repo | https://github.com/tensorflow/tensor2tensor |
Framework | tf |
Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image Retrieval
Title | Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image Retrieval |
Authors | Jian Xu, Cunzhao Shi, Chengzuo Qi, Chunheng Wang, Baihua Xiao |
Abstract | In this paper, we propose a simple but effective semantic part-based weighting aggregation (PWA) for image retrieval. The proposed PWA utilizes the discriminative filters of deep convolutional layers as part detectors. Moreover, we propose the effective unsupervised strategy to select some part detectors to generate the “probabilistic proposals”, which highlight certain discriminative parts of objects and suppress the noise of background. The final global PWA representation could then be acquired by aggregating the regional representations weighted by the selected “probabilistic proposals” corresponding to various semantic content. We conduct comprehensive experiments on four standard datasets and show that our unsupervised PWA outperforms the state-of-the-art unsupervised and supervised aggregation methods. Code is available at https://github.com/XJhaoren/PWA. |
Tasks | Image Retrieval |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01247v3 |
http://arxiv.org/pdf/1705.01247v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-part-based-weighting-aggregation |
Repo | https://github.com/XJhaoren/PWA |
Framework | none |
Semantic Entity Retrieval Toolkit
Title | Semantic Entity Retrieval Toolkit |
Authors | Christophe Van Gysel, Maarten de Rijke, Evangelos Kanoulas |
Abstract | Unsupervised learning of low-dimensional, semantic representations of words and entities has recently gained attention. In this paper we describe the Semantic Entity Retrieval Toolkit (SERT) that provides implementations of our previously published entity representation models. The toolkit provides a unified interface to different representation learning algorithms, fine-grained parsing configuration and can be used transparently with GPUs. In addition, users can easily modify existing models or implement their own models in the framework. After model training, SERT can be used to rank entities according to a textual query and extract the learned entity/word representation for use in downstream algorithms, such as clustering or recommendation. |
Tasks | Representation Learning |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.03757v2 |
http://arxiv.org/pdf/1706.03757v2.pdf | |
PWC | https://paperswithcode.com/paper/semantic-entity-retrieval-toolkit |
Repo | https://github.com/cvangysel/SERT |
Framework | none |
Towards Diverse and Natural Image Descriptions via a Conditional GAN
Title | Towards Diverse and Natural Image Descriptions via a Conditional GAN |
Authors | Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin |
Abstract | Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the “ground-truth” captions while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity – two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. It is noteworthy that training a sequence generator is nontrivial. We overcome the difficulty by Policy Gradient, a strategy stemming from Reinforcement Learning, which allows the generator to receive early feedback along the way. We tested our method on two large datasets, where it performed competitively against real people in our user study and outperformed other methods on various tasks. |
Tasks | Image Captioning |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1703.06029v3 |
http://arxiv.org/pdf/1703.06029v3.pdf | |
PWC | https://paperswithcode.com/paper/towards-diverse-and-natural-image |
Repo | https://github.com/doubledaibo/gancaption_iccv2017 |
Framework | none |
Context-aware Captions from Context-agnostic Supervision
Title | Context-aware Captions from Context-agnostic Supervision |
Authors | Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik |
Abstract | We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of “siamese cat” and “tiger cat”, we generate language that describes the “siamese cat” in a way that distinguishes it from “tiger cat”. Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination. |
Tasks | Image Captioning, Language Modelling |
Published | 2017-01-11 |
URL | http://arxiv.org/abs/1701.02870v3 |
http://arxiv.org/pdf/1701.02870v3.pdf | |
PWC | https://paperswithcode.com/paper/context-aware-captions-from-context-agnostic |
Repo | https://github.com/ruotianluo/DiscCaptioning |
Framework | pytorch |
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Title | Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition |
Authors | Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang |
Abstract | Motion representation plays a vital role in human action recognition in videos. In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow. The derivation also provides theoretical support for using the difference between two frames. By directly calculating pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. It enables the CNN to extract spatiotemporal information, especially the temporal information between frames simultaneously. This simple but powerful idea is validated by experimental results. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed. Experimental results also show that OFF is complementary to other motion modalities such as optical flow. When the proposed method is plugged into the state-of-the-art video action recognition framework, it has 96:0% and 74:2% accuracy on UCF-101 and HMDB-51 respectively. The code for this project is available at https://github.com/kevin-ssy/Optical-Flow-Guided-Feature. |
Tasks | Action Recognition In Videos, Optical Flow Estimation, Temporal Action Localization |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.11152v2 |
http://arxiv.org/pdf/1711.11152v2.pdf | |
PWC | https://paperswithcode.com/paper/optical-flow-guided-feature-a-fast-and-robust |
Repo | https://github.com/kevin-ssy/Optical-Flow-Guided-Feature |
Framework | none |
End-to-end weakly-supervised semantic alignment
Title | End-to-end weakly-supervised semantic alignment |
Authors | Ignacio Rocco, Relja Arandjelović, Josef Sivic |
Abstract | We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semantic alignment that is trainable in an end-to-end manner from weak image-level supervision in the form of matching image pairs. The outcome is that parameters are learnt from rich appearance variation present in different but semantically related images without the need for tedious manual annotation of correspondences at training time. Second, the main component of this architecture is a differentiable soft inlier scoring module, inspired by the RANSAC inlier scoring procedure, that computes the quality of the alignment based on only geometrically consistent correspondences thereby reducing the effect of background clutter. Third, we demonstrate that the proposed approach achieves state-of-the-art performance on multiple standard benchmarks for semantic alignment. |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.06861v2 |
http://arxiv.org/pdf/1712.06861v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-weakly-supervised-semantic |
Repo | https://github.com/ignacio-rocco/weakalign |
Framework | pytorch |