February 1, 2020

3079 words 15 mins read

Paper Group AWR 99

Joint Discriminative and Generative Learning for Person Re-identification. Gesture Recognition in RGB Videos UsingHuman Body Keypoints and Dynamic Time Warping. High-Resolution Network for Photorealistic Style Transfer. Virtual Reality to Study the Gap Between Offline and Real-Time EMG-based Gesture Recognition. Switchable Whitening for Deep Repres …

Joint Discriminative and Generative Learning for Person Re-identification


Title	Joint Discriminative and Generative Learning for Person Re-identification
Authors	Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz
Abstract	Person re-identification (re-id) remains challenging due to significant intra-class variations across different cameras. Recently, there has been a growing interest in using generative models to augment training data and enhance the invariance to input changes. The generative pipelines in existing methods, however, stay relatively separate from the discriminative re-id learning stages. Accordingly, re-id models are often trained in a straightforward manner on the generated data. In this paper, we seek to improve learned re-id embeddings by better leveraging the generated data. To this end, we propose a joint learning framework that couples re-id learning and data generation end-to-end. Our model involves a generative module that separately encodes each person into an appearance code and a structure code, and a discriminative module that shares the appearance encoder with the generative module. By switching the appearance or structure codes, the generative module is able to generate high-quality cross-id composed images, which are online fed back to the appearance encoder and used to improve the discriminative module. The proposed joint learning framework renders significant improvement over the baseline without using generated data, leading to the state-of-the-art performance on several benchmark datasets.
Tasks	Image Generation, Image-to-Image Translation, Person Re-Identification
Published	2019-04-15
URL	https://arxiv.org/abs/1904.07223v2
PDF	https://arxiv.org/pdf/1904.07223v2.pdf
PWC	https://paperswithcode.com/paper/joint-discriminative-and-generative-learning
Repo	https://github.com/NVlabs/DG-Net
Framework	pytorch

Gesture Recognition in RGB Videos UsingHuman Body Keypoints and Dynamic Time Warping


Title	Gesture Recognition in RGB Videos UsingHuman Body Keypoints and Dynamic Time Warping
Authors	Pascal Schneider, Raphael Memmesheimer, Ivanna Kramer, Dietrich Paulus
Abstract	Gesture recognition opens up new ways for humans to intuitively interact with machines. Especially for service robots, gestures can be a valuable addition to the means of communication to, for example, draw the robot’s attention to someone or something. Extracting a gesture from video data and classifying it is a challenging task and a variety of approaches have been proposed throughout the years. This paper presents a method for gesture recognition in RGB videos using OpenPose to extract the pose of a person and Dynamic Time Warping (DTW) in conjunction with One-Nearest-Neighbor (1NN) for time-series classification. The main features of this approach are the independence of any specific hardware and high flexibility, because new gestures can be added to the classifier by adding only a few examples of it. We utilize the robustness of the Deep Learning-based OpenPose framework while avoiding the data-intensive task of training a neural network ourselves. We demonstrate the classification performance of our method using a public dataset.
Tasks	Gesture Recognition, Time Series, Time Series Classification
Published	2019-06-25
URL	https://arxiv.org/abs/1906.12171v1
PDF	https://arxiv.org/pdf/1906.12171v1.pdf
PWC	https://paperswithcode.com/paper/gesture-recognition-in-rgb-videos-usinghuman
Repo	https://github.com/homer-robotics/gesture_recognition_on_rgb_video
Framework	none

High-Resolution Network for Photorealistic Style Transfer


Title	High-Resolution Network for Photorealistic Style Transfer
Authors	Ming Li, Chunyang Ye, Wei Li
Abstract	Photorealistic style transfer aims to transfer the style of one image to another, but preserves the original structure and detail outline of the content image, which makes the content image still look like a real shot after the style transfer. Although some realistic image styling methods have been proposed, these methods are vulnerable to lose the details of the content image and produce some irregular distortion structures. In this paper, we use a high-resolution network as the image generation network. Compared to other methods, which reduce the resolution and then restore the high resolution, our generation network maintains high resolution throughout the process. By connecting high-resolution subnets to low-resolution subnets in parallel and repeatedly multi-scale fusion, high-resolution subnets can continuously receive information from low-resolution subnets. This allows our network to discard less information contained in the image, so the generated images may have a more elaborate structure and less distortion, which is crucial to the visual quality. We conducted extensive experiments and compared the results with existing methods. The experimental results show that our model is effective and produces better results than existing methods for photorealistic image stylization. Our source code with PyTorch framework will be publicly available at https://github.com/limingcv/Photorealistic-Style-Transfer
Tasks	Image Generation, Style Transfer
Published	2019-04-25
URL	http://arxiv.org/abs/1904.11617v1
PDF	http://arxiv.org/pdf/1904.11617v1.pdf
PWC	https://paperswithcode.com/paper/high-resolution-network-for-photorealistic
Repo	https://github.com/KushajveerSingh/Photorealistic-Style-Transfer
Framework	pytorch

Virtual Reality to Study the Gap Between Offline and Real-Time EMG-based Gesture Recognition


Title	Virtual Reality to Study the Gap Between Offline and Real-Time EMG-based Gesture Recognition
Authors	Ulysse Côté-Allard, Gabriel Gagnon-Turcotte, Angkoon Phinyomark, Kyrre Glette, Erik Scheme, François Laviolette, Benoit Gosselin
Abstract	Within sEMG-based gesture recognition, a chasm exists in the literature between offline accuracy and real-time usability of a classifier. This gap mainly stems from the four main dynamic factors in sEMG-based gesture recognition: gesture intensity, limb position, electrode shift and transient changes in the signal. These factors are hard to include within an offline dataset as each of them exponentially augment the number of segments to be recorded. On the other hand, online datasets are biased towards the sEMG-based algorithms providing feedback to the participants, limiting the usability of such datasets as benchmarks. This paper proposes a virtual reality (VR) environment and a real-time experimental protocol from which the four main dynamic factors can more easily be studied. During the online experiment, the gesture recognition feedback is provided through the leap motion camera, enabling the proposed dataset to be re-used to compare future sEMG-based algorithms. 20 able-bodied persons took part in this study, completing three to four sessions over a period spanning between 14 and 21 days. Finally, TADANN, a new transfer learning-based algorithm, is proposed for long term gesture classification and significantly (p<0.05) outperforms fine-tuning a network.
Tasks	Gesture Recognition, Transfer Learning
Published	2019-12-16
URL	https://arxiv.org/abs/1912.09380v1
PDF	https://arxiv.org/pdf/1912.09380v1.pdf
PWC	https://paperswithcode.com/paper/virtual-reality-to-study-the-gap-between
Repo	https://github.com/UlysseCoteAllard/LongTermEMG
Framework	pytorch

Switchable Whitening for Deep Representation Learning


Title	Switchable Whitening for Deep Representation Learning
Authors	Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, Ping Luo
Abstract	Normalization methods are essential components in convolutional neural networks (CNNs). They either standardize or whiten data using statistics estimated in predefined sets of pixels. Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening methods as well as standardization methods. SW learns to switch among these operations in an end-to-end manner. It has several advantages. First, SW adaptively selects appropriate whitening or standardization statistics for different tasks (see Fig.1), making it well suited for a wide range of tasks without manual design. Second, by integrating benefits of different normalizers, SW shows consistent improvements over its counterparts in various challenging benchmarks. Third, SW serves as a useful tool for understanding the characteristics of whitening and standardization techniques. We show that SW outperforms other alternatives on image classification (CIFAR-10/100, ImageNet), semantic segmentation (ADE20K, Cityscapes), domain adaptation (GTA5, Cityscapes), and image style transfer (COCO). For example, without bells and whistles, we achieve state-of-the-art performance with 45.33% mIoU on the ADE20K dataset. Code is available at https://github.com/XingangPan/Switchable-Whitening.
Tasks	Domain Adaptation, Image Classification, Representation Learning, Semantic Segmentation, Style Transfer
Published	2019-04-22
URL	https://arxiv.org/abs/1904.09739v4
PDF	https://arxiv.org/pdf/1904.09739v4.pdf
PWC	https://paperswithcode.com/paper/switchable-whitening-for-deep-representation
Repo	https://github.com/XingangPan/Switchable-Whitening
Framework	pytorch

Unlexicalized Transition-based Discontinuous Constituency Parsing


Title	Unlexicalized Transition-based Discontinuous Constituency Parsing
Authors	Maximin Coavoux, Benoît Crabbé, Shay B. Cohen
Abstract	Lexicalized parsing models are based on the assumptions that (i) constituents are organized around a lexical head (ii) bilexical statistics are crucial to solve ambiguities. In this paper, we introduce an unlexicalized transition-based parser for discontinuous constituency structures, based on a structure-label transition system and a bi-LSTM scoring system. We compare it to lexicalized parsing models in order to address the question of lexicalization in the context of discontinuous constituency parsing. Our experiments show that unlexicalized models systematically achieve higher results than lexicalized models, and provide additional empirical evidence that lexicalization is not necessary to achieve strong parsing results. Our best unlexicalized model sets a new state of the art on English and German discontinuous constituency treebanks. We further provide a per-phenomenon analysis of its errors on discontinuous constituents.
Tasks	Constituency Parsing
Published	2019-02-24
URL	http://arxiv.org/abs/1902.08912v1
PDF	http://arxiv.org/pdf/1902.08912v1.pdf
PWC	https://paperswithcode.com/paper/unlexicalized-transition-based-discontinuous
Repo	https://github.com/mcoavoux/mtg_TACL
Framework	none

On Adversarial Mixup Resynthesis


Title	On Adversarial Mixup Resynthesis
Authors	Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R Devon Hjelm, Yoshua Bengio, Christopher Pal
Abstract	In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore models that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.
Tasks
Published	2019-03-07
URL	https://arxiv.org/abs/1903.02709v4
PDF	https://arxiv.org/pdf/1903.02709v4.pdf
PWC	https://paperswithcode.com/paper/adversarial-mixup-resynthesizers
Repo	https://github.com/christopher-beckham/amr
Framework	pytorch

Real-Time Style Transfer With Strength Control


Title	Real-Time Style Transfer With Strength Control
Authors	Victor Kitov
Abstract	Style transfer is a problem of rendering a content image in the style of another style image. A natural and common practical task in applications of style transfer is to adjust the strength of stylization. Algorithm of Gatys et al. (2016) provides this ability by changing the weighting factors of content and style losses but is computationally inefficient. Real-time style transfer introduced by Johnson et al. (2016) enables fast stylization of any image by passing it through a pre-trained transformer network. Although fast, this architecture is not able to continuously adjust style strength. We propose an extension to real-time style transfer that allows direct control of style strength at inference, still requiring only a single transformer network. We conduct qualitative and quantitative experiments that demonstrate that the proposed method is capable of smooth stylization strength control and removes certain stylization artifacts appearing in the original real-time style transfer method. Comparisons with alternative real-time style transfer algorithms, capable of adjusting stylization strength, show that our method reproduces style with more details.
Tasks	Style Transfer
Published	2019-04-18
URL	http://arxiv.org/abs/1904.08643v1
PDF	http://arxiv.org/pdf/1904.08643v1.pdf
PWC	https://paperswithcode.com/paper/real-time-style-transfer-with-strength
Repo	https://github.com/Apogentus/style-transfer-with-strength-control
Framework	pytorch

Neural Painters: A learned differentiable constraint for generating brushstroke paintings


Title	Neural Painters: A learned differentiable constraint for generating brushstroke paintings
Authors	Reiichiro Nakano
Abstract	We explore neural painters, a generative model for brushstrokes learned from a real non-differentiable and non-deterministic painting program. We show that when training an agent to “paint” images using brushstrokes, using a differentiable neural painter leads to much faster convergence. We propose a method for encouraging this agent to follow human-like strokes when reconstructing digits. We also explore the use of a neural painter as a differentiable image parameterization. By directly optimizing brushstrokes to activate neurons in a pre-trained convolutional network, we can directly visualize ImageNet categories and generate “ideal” paintings of each class. Finally, we present a new concept called intrinsic style transfer. By minimizing only the content loss from neural style transfer, we allow the artistic medium, in this case, brushstrokes, to naturally dictate the resulting style.
Tasks	Style Transfer
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08410v2
PDF	http://arxiv.org/pdf/1904.08410v2.pdf
PWC	https://paperswithcode.com/paper/neural-painters-a-learned-differentiable
Repo	https://github.com/libreai/neural-painters-x
Framework	tf

Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers


Title	Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers
Authors	Jingwen Ye, Xinchao Wang, Yixin Ji, Kairi Ou, Mingli Song
Abstract	Many well-trained Convolutional Neural Network(CNN) models have now been released online by developers for the sake of effortless reproducing. In this paper, we treat such pre-trained networks as teachers and explore how to learn a target student network for customized tasks, using multiple teachers that handle different tasks. We assume no human-labelled annotations are available, and each teacher model can be either single- or multi-task network, where the former is a degenerated case of the latter. The student model, depending on the customized tasks, learns the related knowledge filtered from the multiple teachers, and eventually masters the complete or a subset of expertise from all teachers. To this end, we adopt a layer-wise training strategy, which entangles the student’s network block to be learned with the corresponding teachers. As demonstrated on several benchmarks, the learned student network achieves very promising results, even outperforming the teachers on the customized tasks.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11569v1
PDF	https://arxiv.org/pdf/1905.11569v1.pdf
PWC	https://paperswithcode.com/paper/amalgamating-filtered-knowledge-learning-task
Repo	https://github.com/zju-vipa/KamalEngine
Framework	pytorch

Dancing to Music


Title	Dancing to Music
Authors	Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz
Abstract	Dancing to music is an instinctive move by humans. Learning to model the music-to-dance generation process is, however, a challenging problem. It requires significant efforts to measure the correlation between music and dance as one needs to simultaneously consider multiple aspects, such as style and beat of both music and dance. Additionally, dance is inherently multimodal and various following movements of a pose at any moment are equally likely. In this paper, we propose a synthesis-by-analysis learning framework to generate dance from music. In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move. In the synthesis phase, the model learns how to compose a dance by organizing multiple basic dancing movements seamlessly according to the input music. Experimental qualitative and quantitative results demonstrate that the proposed method can synthesize realistic, diverse,style-consistent, and beat-matching dances from music.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.02001v1
PDF	https://arxiv.org/pdf/1911.02001v1.pdf
PWC	https://paperswithcode.com/paper/dancing-to-music
Repo	https://github.com/NVlabs/Dance2Music
Framework	pytorch

Dance with Flow: Two-in-One Stream Action Detection


Title	Dance with Flow: Two-in-One Stream Action Detection
Authors	Jiaojiao Zhao, Cees G. M. Snoek
Abstract	The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.
Tasks	Action Detection, Optical Flow Estimation
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00696v3
PDF	https://arxiv.org/pdf/1904.00696v3.pdf
PWC	https://paperswithcode.com/paper/dance-with-flow-two-in-one-stream-action
Repo	https://github.com/jiaozizhao/Two-in-One-ActionDetection
Framework	pytorch

Evolved Art with Transparent, Overlapping, and Geometric Shapes


Title	Evolved Art with Transparent, Overlapping, and Geometric Shapes
Authors	Joachim Berg, Nils Gustav Andreas Berggren, Sivert Allergodt Borgeteien, Christian Ruben Alexander Jahren, Arqam Sajid, Stefano Nichele
Abstract	In this work, an evolutionary art project is presented where images are approximated by transparent, overlapping and geometric shapes of different types, e.g., polygons, circles, lines. Genotypes representing features and order of the geometric shapes are evolved with a fitness function that has the corresponding pixels of an input image as a target goal. A genotype-to-phenotype mapping is therefore applied to render images, as the chosen genetic representation is indirect, i.e., genotypes do not include pixels but a combination of shapes with their properties. Different combinations of shapes, quantity of shapes, mutation types and populations are tested. The goal of the work herein is twofold: (1) to approximate images as precisely as possible with evolved indirect encodings, (2) to produce visually appealing results and novel artistic styles.
Tasks
Published	2019-04-12
URL	https://arxiv.org/abs/1904.06110v3
PDF	https://arxiv.org/pdf/1904.06110v3.pdf
PWC	https://paperswithcode.com/paper/evolved-art-with-transparent-overlapping-and
Repo	https://github.com/joacber/Evolved-art-with-transparent-overlapping-and-geometric-shapes
Framework	none

Mask-RCNN and U-net Ensembled for Nuclei Segmentation


Title	Mask-RCNN and U-net Ensembled for Nuclei Segmentation
Authors	Aarno Oskar Vuola, Saad Ullah Akram, Juho Kannala
Abstract	Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10170v1
PDF	http://arxiv.org/pdf/1901.10170v1.pdf
PWC	https://paperswithcode.com/paper/mask-rcnn-and-u-net-ensembled-for-nuclei
Repo	https://github.com/abhinavsagar/Kaggle-Solutions
Framework	none

Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks


Title	Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks
Authors	Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, Ming-Ming Cheng
Abstract	The use of RGB-D information for salient object detection has been explored in recent years. However, relatively few efforts have been spent in modeling salient object detection over real-world human activity scenes with RGB-D. In this work, we fill the gap by making the following contributions to RGB-D salient object detection. First, we carefully collect a new salient person (SIP) dataset, which consists of 1K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusion, illumination, and background. Second, we conduct a large-scale and so far the most comprehensive benchmark comparing contemporary methods, which has long been missing in the area and can serve as a baseline for future research. We systematically summarized 31 popular models, evaluated 17 state-of-the-art methods over seven datasets with totally about 91K images. Third, we propose a simple baseline architecture, called Deep Depth-Depurator Network (D3Net). It consists of a depth depurator unit and a feature learning module, performing initial low-quality depth map filtering and cross-modal feature learning respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across five metrics considered, thus serves as a strong baseline to advance the research frontier. We also demonstrate that D3Net can be used to efficiently extract salient person masks from the real scenes, enabling effective background changed book cover application with 20 fps on a single GPU. All the saliency maps, our new SIP dataset, baseline model, and evaluation tools are made publicly available at https://github.com/DengPingFan/D3NetBenchmark.
Tasks	Object Detection, Salient Object Detection
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06781v1
PDF	https://arxiv.org/pdf/1907.06781v1.pdf
PWC	https://paperswithcode.com/paper/rethinking-rgb-d-salient-object-detection
Repo	https://github.com/jiwei0921/RGBD-SOD-datasets
Framework	none