April 3, 2020

# Paper Group AWR 15

Graph Structured Network for Image-Text Matching. High-Performance Long-Term Tracking with Meta-Updater. Pose-guided Visible Part Matching for Occluded Person ReID. Ontology-based Interpretable Machine Learning for Textual Data. MetaPoison: Practical General-purpose Clean-label Data Poisoning. Semantic Drift Compensation for Class-Incremental Learn …

#### Graph Structured Network for Image-Text Matching

Title Graph Structured Network for Image-Text Matching
Authors Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang
Abstract Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image and text. Existing works learn coarse correspondence based on object co-occurrence statistics, while failing to learn fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase. This is achieved by node-level matching and structure-level matching. The node-level matching associates each node with its relevant nodes from another modality, where the node can be object, relation or attribute. The associated nodes then jointly infer fine-grained correspondence by fusing neighborhood associations at structure-level matching. Comprehensive experiments show that GSMN outperforms state-of-the-art methods on benchmarks, with relative Recall@1 improvements of nearly 7% and 2% on Flickr30K and MSCOCO, respectively. Code will be released at: https://github.com/CrossmodalGroup/GSMN.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00277v1
PDF https://arxiv.org/pdf/2004.00277v1.pdf
PWC https://paperswithcode.com/paper/graph-structured-network-for-image-text
Repo https://github.com/CrossmodalGroup/GSMN
Framework none

#### High-Performance Long-Term Tracking with Meta-Updater

Title High-Performance Long-Term Tracking with Meta-Updater
Authors Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, Xiaoyun Yang
Abstract Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to solve the long-term problem, due to long-term uncertain and noisy observations. In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed meta-updater can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Our meta-updater learns a binary output to guide the tracker’s update and can be easily embedded into different trackers. This work also introduces a long-term tracking framework consisting of an online local tracker, an online verifier, a SiamRPN-based re-detector, and our meta-updater. Numerous experimental results on the VOT2018LT, VOT2019LT, OxUvALT, TLP, and LaSOT benchmarks show that our tracker performs remarkably better than other competing algorithms. Our project is available on the website: https://github.com/Daikenan/LTMU.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00305v1
PDF https://arxiv.org/pdf/2004.00305v1.pdf
PWC https://paperswithcode.com/paper/high-performance-long-term-tracking-with-meta
Repo https://github.com/Daikenan/LTMU
Framework none

#### Pose-guided Visible Part Matching for Occluded Person ReID

Title Pose-guided Visible Part Matching for Occluded Person ReID
Authors Shang Gao, Jingya Wang, Huchuan Lu, Zimo Liu
Abstract Occluded person re-identification is a challenging task as the appearance varies substantially with various obstacles, especially in the crowd scenario. To address this issue, we propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility in an end-to-end framework. Specifically, the proposed PVPM includes two key components: 1) pose-guided attention (PGA) method for part feature pooling that exploits more discriminative local features; 2) pose-guided visibility predictor (PVP) that estimates whether a part suffers the occlusion or not. As there are no ground truth training annotations for the occluded part, we turn to utilize the characteristic of part correspondence in positive pairs and self-mining the correspondence scores via graph matching. The generated correspondence scores are then utilized as pseudo-labels for visibility predictor (PVP). Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods. The source codes are available at https://github.com/hh23333/PVPM
Published 2020-04-01
URL https://arxiv.org/abs/2004.00230v1
PDF https://arxiv.org/pdf/2004.00230v1.pdf
PWC https://paperswithcode.com/paper/pose-guided-visible-part-matching-for
Repo https://github.com/hh23333/PVPM
Framework none

#### Ontology-based Interpretable Machine Learning for Textual Data

Title Ontology-based Interpretable Machine Learning for Textual Data
Authors Phung Lai, NhatHai Phan, Han Hu, Anuja Badeti, David Newman, Dejing Dou
Abstract In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers contextual correlation among words, described in domain knowledge ontologies, to generate semantic explanations. To narrow down the search space for explanations, which is a major problem of long and complicated text data, we design a learnable anchor algorithm, to better extract explanations locally. A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible semantic explanations. An extensive experiment conducted on two real-world datasets shows that our approach generates more precise and insightful explanations compared with baseline approaches.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00204v1
PDF https://arxiv.org/pdf/2004.00204v1.pdf
PWC https://paperswithcode.com/paper/ontology-based-interpretable-machine-learning
Repo https://github.com/PhungLai728/OnML
Framework none

#### MetaPoison: Practical General-purpose Clean-label Data Poisoning

Title MetaPoison: Practical General-purpose Clean-label Data Poisoning
Authors W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein
Abstract Data poisoning–the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data–is an emerging threat in the context of neural networks. Existing attacks for data poisoning have relied on hand-crafted heuristics. Instead, we pose crafting poisons more generally as a bi-level optimization problem, where the inner level corresponds to training a network on a poisoned dataset and the outer level corresponds to updating those poisons to achieve a desired behavior on the trained model. We then propose MetaPoison, a first-order method to solve this optimization quickly. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin under the same setting. MetaPoison is robust: its poisons transfer to a variety of victims with unknown hyperparameters and architectures. MetaPoison is also general-purpose, working not only in fine-tuning scenarios, but also for end-to-end training from scratch with remarkable success, e.g. causing a target image to be misclassified 90% of the time via manipulating just 1% of the dataset. Additionally, MetaPoison can achieve arbitrary adversary goals not previously possible–like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate successful data poisoning of models trained on Google Cloud AutoML Vision. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison
Published 2020-04-01
URL https://arxiv.org/abs/2004.00225v1
PDF https://arxiv.org/pdf/2004.00225v1.pdf
PWC https://paperswithcode.com/paper/metapoison-practical-general-purpose-clean
Repo https://github.com/wronnyhuang/metapoison
Framework none

#### Semantic Drift Compensation for Class-Incremental Learning

Title Semantic Drift Compensation for Class-Incremental Learning
Authors Lu Yu, Bartłomiej Twardowski, Xialei Liu, Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, Joost van de Weijer
Published 2020-04-01
URL https://arxiv.org/abs/2004.00440v1
PDF https://arxiv.org/pdf/2004.00440v1.pdf
PWC https://paperswithcode.com/paper/semantic-drift-compensation-for-class
Repo https://github.com/yulu0724/SDC-IL
Framework none

#### CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection

Title CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection
Authors Zhiwei Dong, Guoxuan Li, Yue Liao, Fei Wang, Pengju Ren, Chen Qian
Abstract Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP. Code will be available at https://github.com/KiveeDong/CentripetalNet.
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2020-03-20
URL https://arxiv.org/abs/2003.09119v1
PDF https://arxiv.org/pdf/2003.09119v1.pdf
PWC https://paperswithcode.com/paper/centripetalnet-pursuing-high-quality-keypoint
Repo https://github.com/KiveeDong/CentripetalNet
Framework pytorch

#### Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Title Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
Authors Matteo Fabbri, Fabio Lanzi, Simone Calderara, Stefano Alletto, Rita Cucchiara
Abstract In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene. Code and models available at https://github.com/fabbrimatteo/LoCO .
Tasks 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published 2020-04-01
URL https://arxiv.org/abs/2004.00329v1
PDF https://arxiv.org/pdf/2004.00329v1.pdf
PWC https://paperswithcode.com/paper/compressed-volumetric-heatmaps-for-multi
Repo https://github.com/fabbrimatteo/LoCO
Framework none

#### Symmetry and Group in Attribute-Object Compositions

Title Symmetry and Group in Attribute-Object Compositions
Authors Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu
Abstract Attributes and objects can compose diverse compositions. To model the compositional nature of these general concepts, it is a good choice to learn them through transformations, such as coupling and decoupling. However, complex transformations need to satisfy specific principles to guarantee the rationality. In this paper, we first propose a previously ignored principle of attribute-object transformation: Symmetry. For example, coupling peeled-apple with attribute peeled should result in peeled-apple, and decoupling peeled from apple should still output apple. Incorporating the symmetry principle, a transformation framework inspired by group theory is built, i.e. SymNet. SymNet consists of two modules, Coupling Network and Decoupling Network. With the group axioms and symmetry property as objectives, we adopt Deep Neural Networks to implement SymNet and train it in an end-to-end paradigm. Moreover, we propose a Relative Moving Distance (RMD) based recognition method to utilize the attribute change instead of the attribute pattern itself to classify attributes. Our symmetry learning can be utilized for the Compositional Zero-Shot Learning task and outperforms the state-of-the-art on widely-used benchmarks. Code is available at https://github.com/DirtyHarryLYL/SymNet.
Published 2020-04-01
URL https://arxiv.org/abs/2004.00587v1
PDF https://arxiv.org/pdf/2004.00587v1.pdf
PWC https://paperswithcode.com/paper/symmetry-and-group-in-attribute-object
Repo https://github.com/DirtyHarryLYL/SymNet
Framework none

#### Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis

Title Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis
Authors Hao Tang, Xiaojuan Qi, Dan Xu, Philip H. S. Torr, Nicu Sebe
Abstract We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for photo-realistic image synthesis from semantic layouts. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to two largely unresolved challenges. First, the semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. Second, the widely adopted CNN operations such as convolution, down-sampling and normalization usually cause spatial resolution loss and thus are unable to fully preserve the original semantic information, leading to semantically inconsistent results (e.g., missing small objects). To tackle the first challenge, we propose to use the edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. Further, to preserve the semantic information, we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout. Extensive experiments on two challenging datasets show that the proposed EdgeGAN can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Ha0Tang/EdgeGAN.
Published 2020-03-31
URL https://arxiv.org/abs/2003.13898v1
PDF https://arxiv.org/pdf/2003.13898v1.pdf
PWC https://paperswithcode.com/paper/edge-guided-gans-with-semantic-preserving-for
Repo https://github.com/Ha0Tang/EdgeGAN
Framework none

#### AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion

Title AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion
Authors Trung Q. Tran
Abstract I introduce a simple but efficient method to solve one of the critical aspects of English grammar which is the relationship between active sentence and passive sentence. In fact, an active sentence and its corresponding passive sentence express the same meaning, but their structure is different. I utilized Prolog [4] along with Definite Clause Grammars (DCG) [5] for doing the conversion between active sentence and passive sentence. Some advanced techniques were also used such as Extra Arguments, Extra Goals, Lexicon, etc. I tried to solve a variety of cases of active and passive sentences such as 12 English tenses, modal verbs, negative form, etc. More details and my contributions will be presented in the following sections. The source code is available at https://github.com/tqtrunghnvn/ActiveAndPassive.
Published 2020-01-16
URL https://arxiv.org/abs/2001.05672v1
PDF https://arxiv.org/pdf/2001.05672v1.pdf
PWC https://paperswithcode.com/paper/aandp-utilizing-prolog-for-converting-between
Repo https://github.com/tqtrunghnvn/ActiveAndPassive
Framework none

#### Deep Snake for Real-Time Instance Segmentation

Title Deep Snake for Real-Time Instance Segmentation
Authors Sida Peng, Wen Jiang, Huaijin Pi, Xiuli Li, Hujun Bao, Xiaowei Zhou
Abstract This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. Unlike some recent methods that directly regress the coordinates of the object boundary points from an image, deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach. For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution. Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization. Experiments show that the proposed approach achieves competitive performances on the Cityscapes, KINS, SBD and COCO datasets while being efficient for real-time applications with a speed of 32.3 fps for 512$\times$512 images on a 1080Ti GPU. The code is available at https://github.com/zju3dv/snake/.
Tasks Instance Segmentation, Object Localization, Real-time Instance Segmentation, Semantic Segmentation
Published 2020-01-06
URL https://arxiv.org/abs/2001.01629v3
PDF https://arxiv.org/pdf/2001.01629v3.pdf
PWC https://paperswithcode.com/paper/deep-snake-for-real-time-instance
Repo https://github.com/ShanghaiTechCVDL/Weekly_Group_Meeting_Paper_List
Framework none

#### Fixing the train-test resolution discrepancy: FixEfficientNet

Title Fixing the train-test resolution discrepancy: FixEfficientNet
Authors Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou
Abstract This note complements the paper “Fixing the train-test resolution discrepancy” that introduced the FixRes method. First, we show that this strategy is advantageously combined with recent training recipes from the literature. Most importantly, we provide new results for the EfficientNet architecture. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained without additional training data achieves 79.3% top-1 accuracy on ImageNet with 5.3M parameters. This is a +0.5% absolute improvement over the Noisy student EfficientNet-B0 trained with 300M unlabeled images and +1.7% compared to the EfficientNet-B0 trained with adversarial examples. An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88.5% top-1 accuracy (top-5: 98.7%), which establishes the new state of the art for ImageNet with a single crop.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08237v3
PDF https://arxiv.org/pdf/2003.08237v3.pdf
PWC https://paperswithcode.com/paper/fixing-the-train-test-resolution-discrepancy-2
Framework pytorch

#### Extreme Algorithm Selection With Dyadic Feature Representation

Title Extreme Algorithm Selection With Dyadic Feature Representation
Authors Alexander Tornede, Marcel Wever, Eyke Hüllermeier
Abstract Algorithm selection (AS) deals with selecting an algorithm from a fixed set of candidate algorithms most suitable for a specific instance of an algorithmic problem, e.g., choosing solvers for SAT problems. Benchmark suites for AS usually comprise candidate sets consisting of at most tens of algorithms, whereas in combined algorithm selection and hyperparameter optimization problems the number of candidates becomes intractable, impeding to learn effective meta-models and thus requiring costly online performance evaluations. Therefore, here we propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms, facilitating meta learning. We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation in which both problem instances and algorithms are described. We find the latter to improve significantly over the current state of the art in various metrics.
Published 2020-01-29
URL https://arxiv.org/abs/2001.10741v1
PDF https://arxiv.org/pdf/2001.10741v1.pdf
Repo https://github.com/alexandertornede/extreme_algorithm_selection
Framework none

#### Elastic Coupled Co-clustering for Single-Cell Genomic Data

Title Elastic Coupled Co-clustering for Single-Cell Genomic Data
Authors Pengcheng Zeng, Zhixiang Lin
Abstract The recent advances in single-cell technologies have enabled us to profile genomic features at unprecedented resolution and data sets from multiple domains are available, including data sets that profile different types of genomic features and data sets that profile the same type of genomic features across different species. These data sets typically have different powers in identifying the unknown cell types through clustering, and data integration can potentially lead to a better performance of clustering algorithms. In this work, we formulate the problem in an unsupervised transfer learning framework, which utilizes knowledge learned from auxiliary data set to improve the clustering performance of target data set. The degree of shared information among the target and auxiliary data sets can vary, and their distributions can also be different. To address these challenges, we propose an elastic coupled co-clustering based transfer learning algorithm, by elastically propagating clustering knowledge obtained from the auxiliary data set to the target data set. Implementation on single-cell genomic data sets shows that our algorithm greatly improves clustering performance over the traditional learning algorithms. The source code and data sets are available at https://github.com/cuhklinlab/elasticC3.