January 31, 2020

3221 words 16 mins read

Paper Group AWR 380

Paper Group AWR 380

ProductNet: a Collection of High-Quality Datasets for Product Representation Learning. Single-Stage Multi-Person Pose Machines. STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection. Opponent Aware Reinforcement Learning. Identifying and Analyzing Cryptocurrency Manipulations in Social Media. GAC-GAN: A General Method for Ap …

ProductNet: a Collection of High-Quality Datasets for Product Representation Learning

Title ProductNet: a Collection of High-Quality Datasets for Product Representation Learning
Authors Chu Wang, Lei Tang, Yang Lu, Shujun Bian, Hirohisa Fujita, Da Zhang, Zuohua Zhang, Yongning Wu
Abstract ProductNet is a collection of high-quality product datasets for better product understanding. Motivated by ImageNet, ProductNet aims at supporting product representation learning by curating product datasets of high quality with properly chosen taxonomy. In this paper, the two goals of building high-quality product datasets and learning product representation support each other in an iterative fashion: the product embedding is obtained via a multi-modal deep neural network (master model) designed to leverage product image and catalog information; and in return, the embedding is utilized via active learning (local model) to vastly accelerate the annotation process. For the labeled data, the proposed master model yields high categorization accuracy (94.7% top-1 accuracy for 1240 classes), which can be used as search indices, partition keys, and input features for machine learning models. The product embedding, as well as the fined-tuned master model for a specific business task, can also be used for various transfer learning tasks.
Tasks Active Learning, Representation Learning, Transfer Learning
Published 2019-04-18
URL http://arxiv.org/abs/1904.09037v1
PDF http://arxiv.org/pdf/1904.09037v1.pdf
PWC https://paperswithcode.com/paper/productnet-a-collection-of-high-quality
Repo https://github.com/kartiknan/ProductNet
Framework none

Single-Stage Multi-Person Pose Machines

Title Single-Stage Multi-Person Pose Machines
Authors Xuecheng Nie, Jianfeng Zhang, Shuicheng Yan, Jiashi Feng
Abstract Multi-person pose estimation is a challenging problem. Existing methods are mostly two-stage based–one stage for proposal generation and the other for allocating poses to corresponding persons. However, such two-stage methods generally suffer low efficiency. In this work, we present the first single-stage model, Single-stage multi-person Pose Machine (SPM), to simplify the pipeline and lift the efficiency for multi-person pose estimation. To achieve this, we propose a novel Structured Pose Representation (SPR) that unifies person instance and body joint position representations. Based on SPR, we develop the SPM model that can directly predict structured poses for multiple persons in a single stage, and thus offer a more compact pipeline and attractive efficiency advantage over two-stage methods. In particular, SPR introduces the root joints to indicate different person instances and human body joint positions are encoded into their displacements w.r.t. the roots. To better predict long-range displacements for some joints, SPR is further extended to hierarchical representations. Based on SPR, SPM can efficiently perform multi-person poses estimation by simultaneously predicting root joints (location of instances) and body joint displacements via CNNs. Moreover, to demonstrate the generality of SPM, we also apply it to multi-person 3D pose estimation. Comprehensive experiments on benchmarks MPII, extended PASCAL-Person-Part, MSCOCO and CMU Panoptic clearly demonstrate the state-of-the-art efficiency of SPM for multi-person 2D/3D pose estimation, together with outstanding accuracy.
Tasks 3D Pose Estimation, Multi-Person Pose Estimation, Pose Estimation
Published 2019-08-24
URL https://arxiv.org/abs/1908.09220v1
PDF https://arxiv.org/pdf/1908.09220v1.pdf
PWC https://paperswithcode.com/paper/single-stage-multi-person-pose-machines
Repo https://github.com/murdockhou/Single-Stage-Multi-person-Pose-Machines
Framework tf

STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection

Title STAGE: Spatio-Temporal Attention on Graph Entities for Video Action Detection
Authors Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara
Abstract Spatio-temporal action localization is a challenging yet fascinating task that aims to detect and classify human actions in video clips. In this paper, we develop a high-level video understanding module which can encode interactions between actors and objects both in space and time. In our formulation, spatio-temporal relationships are learned by performing self-attention operations on a graph structure connecting entities from consecutive clips. Noticeably, the use of graph learning is unprecedented for this task. From a computational point of view, the proposed module is backbone independent by design and does not need end-to-end training. When tested on the AVA dataset, it demonstrates a 10-16% relative mAP improvement over the baseline. Further, it can outperform or bring performances comparable to state-of-the-art models which require heavy end-to-end and synchronized training on multiple GPUs. Code is publicly available at: https://github.com/aimagelab/STAGE_action_detection.
Tasks Action Detection, Action Localization, Spatio-Temporal Action Localization, Temporal Action Localization, Video Understanding
Published 2019-12-09
URL https://arxiv.org/abs/1912.04316v1
PDF https://arxiv.org/pdf/1912.04316v1.pdf
PWC https://paperswithcode.com/paper/stage-spatio-temporal-attention-on-graph
Repo https://github.com/aimagelab/STAGE_action_detection
Framework pytorch

Opponent Aware Reinforcement Learning

Title Opponent Aware Reinforcement Learning
Authors Victor Gallego, Roi Naveiro, David Rios Insua, David Gomez-Ullate Oteiza
Abstract We introduce Threatened Markov Decision Processes (TMDPs) as an extension of the classical Markov Decision Process framework for Reinforcement Learning (RL). TMDPs allow suporting a decision maker against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries in RL while the agent learns
Tasks
Published 2019-08-22
URL https://arxiv.org/abs/1908.08773v2
PDF https://arxiv.org/pdf/1908.08773v2.pdf
PWC https://paperswithcode.com/paper/opponent-aware-reinforcement-learning
Repo https://github.com/vicgalle/ARAMARL
Framework none

Identifying and Analyzing Cryptocurrency Manipulations in Social Media

Title Identifying and Analyzing Cryptocurrency Manipulations in Social Media
Authors Mehrnoosh Mirtaheri, Sami Abu-El-Haija, Fred Morstatter, Greg Ver Steeg, Aram Galstyan
Abstract Interest surrounding cryptocurrencies, digital or virtual currencies that are used as a medium for financial transactions, has grown tremendously in recent years. The anonymity surrounding these currencies makes investors particularly susceptible to fraud—such as “pump and dump” scams—where the goal is to artificially inflate the perceived worth of a currency, luring victims into investing before the fraudsters can sell their holdings. Because of the speed and relative anonymity offered by social platforms such as Twitter and Telegram, social media has become a preferred platform for scammers who wish to spread false hype about the cryptocurrency they are trying to pump. In this work we propose and evaluate a computational approach that can automatically identify pump and dump scams as they unfold by combining information across social media platforms. We also develop a multi-modal approach for predicting whether a particular pump attempt will succeed or not. Finally, we analyze the prevalence of bots in cryptocurrency related tweets, and observe a significant increase in bot activity during the pump attempts.
Tasks
Published 2019-02-04
URL https://arxiv.org/abs/1902.03110v2
PDF https://arxiv.org/pdf/1902.03110v2.pdf
PWC https://paperswithcode.com/paper/identifying-and-analyzing-cryptocurrency
Repo https://github.com/Mehrnoom/Cryptocurrency-Pump-Dump
Framework none

GAC-GAN: A General Method for Appearance-Controllable Human Video Motion Transfer

Title GAC-GAN: A General Method for Appearance-Controllable Human Video Motion Transfer
Authors Dongxu Wei, Xiaowei Xu, Haibin Shen, Kejie Huang
Abstract Human video motion transfer has a wide range of applications in multimedia, computer vision and graphics. Recently, due to the rapid development of Generative Adversarial Networks (GANs), there has been significant progress in the field. However, almost all existing GAN-based works are prone to address the mapping from human motions to video scenes, with scene appearances are encoded individually in the trained models. Therefore, each trained model can only generate videos with a specific scene appearance, new models are required to be trained to generate new appearances. Besides, existing works lack the capability of appearance control. For example, users have to provide video records of wearing new clothes or performing in new backgrounds to enable clothes or background changing in their synthetic videos, which greatly limits the application flexibility. In this paper, we propose GAC-GAN, a general method for appearance-controllable human video motion transfer. To enable general-purpose appearance synthesis, we propose to include appearance information in the conditioning inputs. Thus, once trained, our model can generate new appearances by altering the input appearance information. To achieve appearance control, we first obtain the appearance-controllable conditioning inputs and then utilize a two-stage GAC-GAN to generate the corresponding appearance-controllable outputs, where we utilize an ACGAN loss and a shadow extraction module for output foreground and background appearance control respectively. We further build a solo dance dataset containing a large number of dance videos for training and evaluation. Experimental results show that, our proposed GAC-GAN can not only support appearance-controllable human video motion transfer but also achieve higher video quality than state-of-art methods.
Tasks
Published 2019-11-25
URL https://arxiv.org/abs/1911.10672v2
PDF https://arxiv.org/pdf/1911.10672v2.pdf
PWC https://paperswithcode.com/paper/appearance-composing-gan-a-general-method-for
Repo https://github.com/wswdx/Appearance-Composing-GAN
Framework none

ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia

Title ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia
Authors Aaron Halfaker, R. Stuart Geiger
Abstract Algorithmic systems – from rule-based bots to machine learning classifiers – have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter-vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality and consistency. However, conversations about how quality control should work and what role algorithms should play have generally been led by the expert engineers who have the skills and resources to develop and modify these complex algorithmic systems. In this paper, we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithmic systems in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 4 years since its deployment.
Tasks
Published 2019-09-11
URL https://arxiv.org/abs/1909.05189v2
PDF https://arxiv.org/pdf/1909.05189v2.pdf
PWC https://paperswithcode.com/paper/ores-lowering-barriers-with-participatory
Repo https://github.com/halfak/ores-demos
Framework none

Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding

Title Bidirectional Context-Aware Hierarchical Attention Network for Document Understanding
Authors Jean-Baptiste Remy, Antoine Jean-Pierre Tixier, Michalis Vazirgiannis
Abstract The Hierarchical Attention Network (HAN) has made great strides, but it suffers a major limitation: at level 1, each sentence is encoded in complete isolation. In this work, we propose and compare several modifications of HAN in which the sentence encoder is able to make context-aware attentional decisions (CAHAN). Furthermore, we propose a bidirectional document encoder that processes the document forwards and backwards, using the preceding and following sentences as context. Experiments on three large-scale sentiment and topic classification datasets show that the bidirectional version of CAHAN outperforms HAN everywhere, with only a modest increase in computation time. While results are promising, we expect the superiority of CAHAN to be even more evident on tasks requiring a deeper understanding of the input documents, such as abstractive summarization. Code is publicly available.
Tasks Abstractive Text Summarization
Published 2019-08-16
URL https://arxiv.org/abs/1908.06006v1
PDF https://arxiv.org/pdf/1908.06006v1.pdf
PWC https://paperswithcode.com/paper/bidirectional-context-aware-hierarchical
Repo https://github.com/JbRemy/Cahan
Framework tf

MEx: Multi-modal Exercises Dataset for Human Activity Recognition

Title MEx: Multi-modal Exercises Dataset for Human Activity Recognition
Authors Anjana Wijekoon, Nirmalie Wiratunga, Kay Cooper
Abstract MEx: Multi-modal Exercises Dataset is a multi-sensor, multi-modal dataset, implemented to benchmark Human Activity Recognition(HAR) and Multi-modal Fusion algorithms. Collection of this dataset was inspired by the need for recognising and evaluating quality of exercise performance to support patients with Musculoskeletal Disorders(MSD). We select 7 exercises regularly recommended for MSD patients by physiotherapists and collected data with four sensors a pressure mat, a depth camera and two accelerometers. The dataset contains three data modalities; numerical time-series data, video data and pressure sensor data posing interesting research challenges when reasoning for HAR and Exercise Quality Assessment. This paper presents our evaluation of the dataset on number of standard classification algorithms for the HAR task by comparing different feature representation algorithms for each sensor. These results set a reference performance for each individual sensor that expose their strengths and weaknesses for the future tasks. In addition we visualise pressure mat data to explore the potential of the sensor to capture exercise performance quality. With the recent advancement in multi-modal fusion, we also believe MEx is a suitable dataset to benchmark not only HAR algorithms, but also fusion algorithms of heterogeneous data types in multiple application domains.
Tasks Activity Recognition, Human Activity Recognition, Time Series
Published 2019-08-13
URL https://arxiv.org/abs/1908.08992v1
PDF https://arxiv.org/pdf/1908.08992v1.pdf
PWC https://paperswithcode.com/paper/mex-multi-modal-exercises-dataset-for-human
Repo https://github.com/anjanaw/MEx
Framework tf

Network Deconvolution

Title Network Deconvolution
Authors Chengxi Ye, Matthew Evanusa, Hua He, Anton Mitrokhin, Tom Goldstein, James A. Yorke, Cornelia Fermüller, Yiannis Aloimonos
Abstract Convolution is a central operation in Convolutional Neural Networks (CNNs), which applies a kernel to overlapping regions shifted across the image. However, because of the strong correlations in real-world image data, convolutional kernels are in effect re-learning redundant data. In this work, we show that this redundancy has made neural network training challenging, and propose network deconvolution, a procedure which optimally removes pixel-wise and channel-wise correlations before the data is fed into each layer. Network deconvolution can be efficiently calculated at a fraction of the computational cost of a convolution layer. We also show that the deconvolution filters in the first layer of the network resemble the center-surround structure found in biological neurons in the visual regions of the brain. Filtering with such kernels results in a sparse representation, a desired property that has been missing in the training of neural networks. Learning from the sparse representation promotes faster convergence and superior results without the use of batch normalization. We apply our network deconvolution operation to 10 modern neural network models by replacing batch normalization within each. Extensive experiments show that the network deconvolution operation is able to deliver performance improvement in all cases on the CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, Cityscapes, and ImageNet datasets.
Tasks Image Classification
Published 2019-05-28
URL https://arxiv.org/abs/1905.11926v4
PDF https://arxiv.org/pdf/1905.11926v4.pdf
PWC https://paperswithcode.com/paper/190511926
Repo https://github.com/deconvolutionpaper/deconvolution
Framework pytorch

LaFIn: Generative Landmark Guided Face Inpainting

Title LaFIn: Generative Landmark Guided Face Inpainting
Authors Yang Yang, Xiaojie Guo, Jiayi Ma, Lin Ma, Haibin Ling
Abstract It is challenging to inpaint face images in the wild, due to the large variation of appearance, such as different poses, expressions and occlusions. A good inpainting algorithm should guarantee the realism of output, including the topological structure among eyes, nose and mouth, as well as the attribute consistency on pose, gender, ethnicity, expression, etc. This paper studies an effective deep learning based strategy to deal with these issues, which comprises of a facial landmark predicting subnet and an image inpainting subnet. Concretely, given partial observation, the landmark predictor aims to provide the structural information (e.g. topological relationship and expression) of incomplete faces, while the inpaintor is to generate plausible appearance (e.g. gender and ethnicity) conditioned on the predicted landmarks. Experiments on the CelebA-HQ and CelebA datasets are conducted to reveal the efficacy of our design and, to demonstrate its superiority over state-of-the-art alternatives both qualitatively and quantitatively. In addition, we assume that high-quality completed faces together with their landmarks can be utilized as augmented data to further improve the performance of (any) landmark predictor, which is corroborated by experimental results on the 300W and WFLW datasets.
Tasks Facial Inpainting, Image Inpainting
Published 2019-11-26
URL https://arxiv.org/abs/1911.11394v1
PDF https://arxiv.org/pdf/1911.11394v1.pdf
PWC https://paperswithcode.com/paper/lafin-generative-landmark-guided-face
Repo https://github.com/YaN9-Y/lafin
Framework pytorch

Interlaced Sparse Self-Attention for Semantic Segmentation

Title Interlaced Sparse Self-Attention for Semantic Segmentation
Authors Lang Huang, Yuhui Yuan, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang
Abstract In this paper, we present a so-called interlaced sparse self-attention approach to improve the efficiency of the \emph{self-attention} mechanism for semantic segmentation. The main idea is that we factorize the dense affinity matrix as the product of two sparse affinity matrices. There are two successive attention modules each estimating a sparse affinity matrix. The first attention module is used to estimate the affinities within a subset of positions that have long spatial interval distances and the second attention module is used to estimate the affinities within a subset of positions that have short spatial interval distances. These two attention modules are designed so that each position is able to receive the information from all the other positions. In contrast to the original self-attention module, our approach decreases the computation and memory complexity substantially especially when processing high-resolution feature maps. We empirically verify the effectiveness of our approach on six challenging semantic segmentation benchmarks.
Tasks Semantic Segmentation
Published 2019-07-29
URL https://arxiv.org/abs/1907.12273v2
PDF https://arxiv.org/pdf/1907.12273v2.pdf
PWC https://paperswithcode.com/paper/interlaced-sparse-self-attention-for-semantic
Repo https://github.com/PkuRainBow/OCNet
Framework pytorch

Unsupervised Learning of Eye Gaze Representation from the Web

Title Unsupervised Learning of Eye Gaze Representation from the Web
Authors Neeru Dubey, Shreya Ghosh, Abhinav Dhall
Abstract Automatic eye gaze estimation has interested researchers for a while now. In this paper, we propose an unsupervised learning based method for estimating the eye gaze region. To train the proposed network “Ize-Net” in self-supervised manner, we collect a large `in the wild’ dataset containing 1,54,251 images from the web. For the images in the database, we divide the gaze into three regions based on an automatic technique based on pupil-centers localization and then use a feature-based technique to determine the gaze region. The performance is evaluated on the Tablet Gaze and CAVE datasets by fine-tuning results of Ize-Net for the task of eye gaze estimation. The feature representation learned is also used to train traditional machine learning algorithms for eye gaze estimation. The results demonstrate that the proposed method learns a rich data representation, which can be efficiently fine-tuned for any eye gaze estimation dataset. |
Tasks Gaze Estimation
Published 2019-04-04
URL http://arxiv.org/abs/1904.02459v1
PDF http://arxiv.org/pdf/1904.02459v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-eye-gaze
Repo https://github.com/Neerudubey/Unsupervised-Eye-gaze-estimation
Framework none

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Title NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
Authors Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le
Abstract Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales. NAS-FPN, combined with various backbone models in the RetinaNet framework, achieves better accuracy and latency tradeoff compared to state-of-the-art object detection models. NAS-FPN improves mobile detection accuracy by 2 AP compared to state-of-the-art SSDLite with MobileNetV2 model in [32] and achieves 48.3 AP which surpasses Mask R-CNN [10] detection accuracy with less computation time.
Tasks Neural Architecture Search, Object Detection, Real-Time Object Detection
Published 2019-04-16
URL http://arxiv.org/abs/1904.07392v1
PDF http://arxiv.org/pdf/1904.07392v1.pdf
PWC https://paperswithcode.com/paper/nas-fpn-learning-scalable-feature-pyramid
Repo https://github.com/tensorflow/tpu/tree/master/models/official/detection
Framework tf

BAE-NET: Branched Autoencoder for Shape Co-Segmentation

Title BAE-NET: Branched Autoencoder for Shape Co-Segmentation
Authors Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, Hao Zhang
Abstract We treat shape co-segmentation as a representation learning problem and introduce BAE-NET, a branched autoencoder network, for the task. The unsupervised BAE-NET is trained with a collection of un-segmented shapes, using a shape reconstruction loss, without any ground-truth labels. Specifically, the network takes an input shape and encodes it using a convolutional neural network, whereas the decoder concatenates the resulting feature code with a point coordinate and outputs a value indicating whether the point is inside/outside the shape. Importantly, the decoder is branched: each branch learns a compact representation for one commonly recurring part of the shape collection, e.g., airplane wings. By complementing the shape reconstruction loss with a label loss, BAE-NET is easily tuned for one-shot learning. We show unsupervised, weakly supervised, and one-shot learning results by BAE-NET, demonstrating that using only a couple of exemplars, our network can generally outperform state-of-the-art supervised methods trained on hundreds of segmented shapes. Code is available at https://github.com/czq142857/BAE-NET.
Tasks One-Shot Learning, Representation Learning
Published 2019-03-27
URL https://arxiv.org/abs/1903.11228v2
PDF https://arxiv.org/pdf/1903.11228v2.pdf
PWC https://paperswithcode.com/paper/bae-net-branched-autoencoder-for-shape-co
Repo https://github.com/czq142857/BAE-NET
Framework tf
comments powered by Disqus