April 3, 2020

3202 words 16 mins read

Paper Group AWR 15

Graph Structured Network for Image-Text Matching. High-Performance Long-Term Tracking with Meta-Updater. Pose-guided Visible Part Matching for Occluded Person ReID. Ontology-based Interpretable Machine Learning for Textual Data. MetaPoison: Practical General-purpose Clean-label Data Poisoning. Semantic Drift Compensation for Class-Incremental Learn …

Graph Structured Network for Image-Text Matching


Title	Graph Structured Network for Image-Text Matching
Authors	Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang
Abstract	Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image and text. Existing works learn coarse correspondence based on object co-occurrence statistics, while failing to learn fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase. This is achieved by node-level matching and structure-level matching. The node-level matching associates each node with its relevant nodes from another modality, where the node can be object, relation or attribute. The associated nodes then jointly infer fine-grained correspondence by fusing neighborhood associations at structure-level matching. Comprehensive experiments show that GSMN outperforms state-of-the-art methods on benchmarks, with relative Recall@1 improvements of nearly 7% and 2% on Flickr30K and MSCOCO, respectively. Code will be released at: https://github.com/CrossmodalGroup/GSMN.
Tasks	Text Matching
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00277v1
PDF	https://arxiv.org/pdf/2004.00277v1.pdf
PWC	https://paperswithcode.com/paper/graph-structured-network-for-image-text
Repo	https://github.com/CrossmodalGroup/GSMN
Framework	none

High-Performance Long-Term Tracking with Meta-Updater


Title	High-Performance Long-Term Tracking with Meta-Updater
Authors	Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, Xiaoyun Yang
Abstract	Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to solve the long-term problem, due to long-term uncertain and noisy observations. In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed meta-updater can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Our meta-updater learns a binary output to guide the tracker’s update and can be easily embedded into different trackers. This work also introduces a long-term tracking framework consisting of an online local tracker, an online verifier, a SiamRPN-based re-detector, and our meta-updater. Numerous experimental results on the VOT2018LT, VOT2019LT, OxUvALT, TLP, and LaSOT benchmarks show that our tracker performs remarkably better than other competing algorithms. Our project is available on the website: https://github.com/Daikenan/LTMU.
Tasks	Visual Tracking
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00305v1
PDF	https://arxiv.org/pdf/2004.00305v1.pdf
PWC	https://paperswithcode.com/paper/high-performance-long-term-tracking-with-meta
Repo	https://github.com/Daikenan/LTMU
Framework	none

Pose-guided Visible Part Matching for Occluded Person ReID


Title	Pose-guided Visible Part Matching for Occluded Person ReID
Authors	Shang Gao, Jingya Wang, Huchuan Lu, Zimo Liu
Abstract	Occluded person re-identification is a challenging task as the appearance varies substantially with various obstacles, especially in the crowd scenario. To address this issue, we propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility in an end-to-end framework. Specifically, the proposed PVPM includes two key components: 1) pose-guided attention (PGA) method for part feature pooling that exploits more discriminative local features; 2) pose-guided visibility predictor (PVP) that estimates whether a part suffers the occlusion or not. As there are no ground truth training annotations for the occluded part, we turn to utilize the characteristic of part correspondence in positive pairs and self-mining the correspondence scores via graph matching. The generated correspondence scores are then utilized as pseudo-labels for visibility predictor (PVP). Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods. The source codes are available at https://github.com/hh23333/PVPM
Tasks	Graph Matching, Person Re-Identification
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00230v1
PDF	https://arxiv.org/pdf/2004.00230v1.pdf
PWC	https://paperswithcode.com/paper/pose-guided-visible-part-matching-for
Repo	https://github.com/hh23333/PVPM
Framework	none

Ontology-based Interpretable Machine Learning for Textual Data


Title	Ontology-based Interpretable Machine Learning for Textual Data
Authors	Phung Lai, NhatHai Phan, Han Hu, Anuja Badeti, David Newman, Dejing Dou
Abstract	In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers contextual correlation among words, described in domain knowledge ontologies, to generate semantic explanations. To narrow down the search space for explanations, which is a major problem of long and complicated text data, we design a learnable anchor algorithm, to better extract explanations locally. A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible semantic explanations. An extensive experiment conducted on two real-world datasets shows that our approach generates more precise and insightful explanations compared with baseline approaches.
Tasks	Interpretable Machine Learning
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00204v1
PDF	https://arxiv.org/pdf/2004.00204v1.pdf
PWC	https://paperswithcode.com/paper/ontology-based-interpretable-machine-learning
Repo	https://github.com/PhungLai728/OnML
Framework	none

MetaPoison: Practical General-purpose Clean-label Data Poisoning


Title	MetaPoison: Practical General-purpose Clean-label Data Poisoning
Authors	W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein
Abstract	Data poisoning–the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data–is an emerging threat in the context of neural networks. Existing attacks for data poisoning have relied on hand-crafted heuristics. Instead, we pose crafting poisons more generally as a bi-level optimization problem, where the inner level corresponds to training a network on a poisoned dataset and the outer level corresponds to updating those poisons to achieve a desired behavior on the trained model. We then propose MetaPoison, a first-order method to solve this optimization quickly. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin under the same setting. MetaPoison is robust: its poisons transfer to a variety of victims with unknown hyperparameters and architectures. MetaPoison is also general-purpose, working not only in fine-tuning scenarios, but also for end-to-end training from scratch with remarkable success, e.g. causing a target image to be misclassified 90% of the time via manipulating just 1% of the dataset. Additionally, MetaPoison can achieve arbitrary adversary goals not previously possible–like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate successful data poisoning of models trained on Google Cloud AutoML Vision. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison
Tasks	AutoML, data poisoning
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00225v1
PDF	https://arxiv.org/pdf/2004.00225v1.pdf
PWC	https://paperswithcode.com/paper/metapoison-practical-general-purpose-clean
Repo	https://github.com/wronnyhuang/metapoison
Framework	none

Semantic Drift Compensation for Class-Incremental Learning


Title	Semantic Drift Compensation for Class-Incremental Learning
Authors	Lu Yu, Bartłomiej Twardowski, Xialei Liu, Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, Joost van de Weijer
Abstract	Class-incremental learning of deep networks sequentially increases the number of classes to be classified. During training, the network has only access to data of one task at a time, where each task contains several classes. In this setting, networks suffer from catastrophic forgetting which refers to the drastic drop in performance on previous tasks. The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes. Embedding networks have the advantage that new classes can be naturally included into the network without adding new weights. Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. We approximate the drift of previous tasks based on the drift that is experienced by current task data. We perform experiments on fine-grained datasets, CIFAR100 and ImageNet-Subset. We demonstrate that embedding networks suffer significantly less from catastrophic forgetting. We outperform existing methods which do not require exemplars and obtain competitive results compared to methods which store exemplars. Furthermore, we show that our proposed SDC when combined with existing methods to prevent forgetting consistently improves results.
Tasks
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00440v1
PDF	https://arxiv.org/pdf/2004.00440v1.pdf
PWC	https://paperswithcode.com/paper/semantic-drift-compensation-for-class
Repo	https://github.com/yulu0724/SDC-IL
Framework	none

CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection


Title	CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection
Authors	Zhiwei Dong, Guoxuan Li, Yue Liao, Fei Wang, Pengju Ren, Chen Qian
Abstract	Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP. Code will be available at https://github.com/KiveeDong/CentripetalNet.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09119v1
PDF	https://arxiv.org/pdf/2003.09119v1.pdf
PWC	https://paperswithcode.com/paper/centripetalnet-pursuing-high-quality-keypoint
Repo	https://github.com/KiveeDong/CentripetalNet
Framework	pytorch

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation


Title	Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
Authors	Matteo Fabbri, Fabio Lanzi, Simone Calderara, Stefano Alletto, Rita Cucchiara
Abstract	In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene. Code and models available at https://github.com/fabbrimatteo/LoCO .
Tasks	3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00329v1
PDF	https://arxiv.org/pdf/2004.00329v1.pdf
PWC	https://paperswithcode.com/paper/compressed-volumetric-heatmaps-for-multi
Repo	https://github.com/fabbrimatteo/LoCO
Framework	none

Symmetry and Group in Attribute-Object Compositions


Title	Symmetry and Group in Attribute-Object Compositions
Authors	Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu
Abstract	Attributes and objects can compose diverse compositions. To model the compositional nature of these general concepts, it is a good choice to learn them through transformations, such as coupling and decoupling. However, complex transformations need to satisfy specific principles to guarantee the rationality. In this paper, we first propose a previously ignored principle of attribute-object transformation: Symmetry. For example, coupling peeled-apple with attribute peeled should result in peeled-apple, and decoupling peeled from apple should still output apple. Incorporating the symmetry principle, a transformation framework inspired by group theory is built, i.e. SymNet. SymNet consists of two modules, Coupling Network and Decoupling Network. With the group axioms and symmetry property as objectives, we adopt Deep Neural Networks to implement SymNet and train it in an end-to-end paradigm. Moreover, we propose a Relative Moving Distance (RMD) based recognition method to utilize the attribute change instead of the attribute pattern itself to classify attributes. Our symmetry learning can be utilized for the Compositional Zero-Shot Learning task and outperforms the state-of-the-art on widely-used benchmarks. Code is available at https://github.com/DirtyHarryLYL/SymNet.
Tasks	Zero-Shot Learning
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00587v1
PDF	https://arxiv.org/pdf/2004.00587v1.pdf
PWC	https://paperswithcode.com/paper/symmetry-and-group-in-attribute-object
Repo	https://github.com/DirtyHarryLYL/SymNet
Framework	none

Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis


Title	Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis
Authors	Hao Tang, Xiaojuan Qi, Dan Xu, Philip H. S. Torr, Nicu Sebe
Abstract	We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for photo-realistic image synthesis from semantic layouts. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to two largely unresolved challenges. First, the semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. Second, the widely adopted CNN operations such as convolution, down-sampling and normalization usually cause spatial resolution loss and thus are unable to fully preserve the original semantic information, leading to semantically inconsistent results (e.g., missing small objects). To tackle the first challenge, we propose to use the edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. Further, to preserve the semantic information, we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout. Extensive experiments on two challenging datasets show that the proposed EdgeGAN can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Ha0Tang/EdgeGAN.
Tasks	Image Generation
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13898v1
PDF	https://arxiv.org/pdf/2003.13898v1.pdf
PWC	https://paperswithcode.com/paper/edge-guided-gans-with-semantic-preserving-for
Repo	https://github.com/Ha0Tang/EdgeGAN
Framework	none

AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion


Title	AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion
Authors	Trung Q. Tran
Abstract	I introduce a simple but efficient method to solve one of the critical aspects of English grammar which is the relationship between active sentence and passive sentence. In fact, an active sentence and its corresponding passive sentence express the same meaning, but their structure is different. I utilized Prolog [4] along with Definite Clause Grammars (DCG) [5] for doing the conversion between active sentence and passive sentence. Some advanced techniques were also used such as Extra Arguments, Extra Goals, Lexicon, etc. I tried to solve a variety of cases of active and passive sentences such as 12 English tenses, modal verbs, negative form, etc. More details and my contributions will be presented in the following sections. The source code is available at https://github.com/tqtrunghnvn/ActiveAndPassive.
Tasks
Published	2020-01-16
URL	https://arxiv.org/abs/2001.05672v1
PDF	https://arxiv.org/pdf/2001.05672v1.pdf
PWC	https://paperswithcode.com/paper/aandp-utilizing-prolog-for-converting-between
Repo	https://github.com/tqtrunghnvn/ActiveAndPassive
Framework	none

Deep Snake for Real-Time Instance Segmentation


Title	Deep Snake for Real-Time Instance Segmentation
Authors	Sida Peng, Wen Jiang, Huaijin Pi, Xiuli Li, Hujun Bao, Xiaowei Zhou
Abstract	This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. Unlike some recent methods that directly regress the coordinates of the object boundary points from an image, deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach. For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution. Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization. Experiments show that the proposed approach achieves competitive performances on the Cityscapes, KINS, SBD and COCO datasets while being efficient for real-time applications with a speed of 32.3 fps for 512$\times$512 images on a 1080Ti GPU. The code is available at https://github.com/zju3dv/snake/.
Tasks	Instance Segmentation, Object Localization, Real-time Instance Segmentation, Semantic Segmentation
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01629v3
PDF	https://arxiv.org/pdf/2001.01629v3.pdf
PWC	https://paperswithcode.com/paper/deep-snake-for-real-time-instance
Repo	https://github.com/ShanghaiTechCVDL/Weekly_Group_Meeting_Paper_List
Framework	none

Fixing the train-test resolution discrepancy: FixEfficientNet


Title	Fixing the train-test resolution discrepancy: FixEfficientNet
Authors	Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou
Abstract	This note complements the paper “Fixing the train-test resolution discrepancy” that introduced the FixRes method. First, we show that this strategy is advantageously combined with recent training recipes from the literature. Most importantly, we provide new results for the EfficientNet architecture. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained without additional training data achieves 79.3% top-1 accuracy on ImageNet with 5.3M parameters. This is a +0.5% absolute improvement over the Noisy student EfficientNet-B0 trained with 300M unlabeled images and +1.7% compared to the EfficientNet-B0 trained with adversarial examples. An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88.5% top-1 accuracy (top-5: 98.7%), which establishes the new state of the art for ImageNet with a single crop.
Tasks	Data Augmentation, Image Classification
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08237v3
PDF	https://arxiv.org/pdf/2003.08237v3.pdf
PWC	https://paperswithcode.com/paper/fixing-the-train-test-resolution-discrepancy-2
Repo	https://github.com/facebookresearch/FixRes
Framework	pytorch

Extreme Algorithm Selection With Dyadic Feature Representation


Title	Extreme Algorithm Selection With Dyadic Feature Representation
Authors	Alexander Tornede, Marcel Wever, Eyke Hüllermeier
Abstract	Algorithm selection (AS) deals with selecting an algorithm from a fixed set of candidate algorithms most suitable for a specific instance of an algorithmic problem, e.g., choosing solvers for SAT problems. Benchmark suites for AS usually comprise candidate sets consisting of at most tens of algorithms, whereas in combined algorithm selection and hyperparameter optimization problems the number of candidates becomes intractable, impeding to learn effective meta-models and thus requiring costly online performance evaluations. Therefore, here we propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms, facilitating meta learning. We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation in which both problem instances and algorithms are described. We find the latter to improve significantly over the current state of the art in various metrics.
Tasks	Hyperparameter Optimization, Meta-Learning
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10741v1
PDF	https://arxiv.org/pdf/2001.10741v1.pdf
PWC	https://paperswithcode.com/paper/extreme-algorithm-selection-with-dyadic
Repo	https://github.com/alexandertornede/extreme_algorithm_selection
Framework	none

Elastic Coupled Co-clustering for Single-Cell Genomic Data


Title	Elastic Coupled Co-clustering for Single-Cell Genomic Data
Authors	Pengcheng Zeng, Zhixiang Lin
Abstract	The recent advances in single-cell technologies have enabled us to profile genomic features at unprecedented resolution and data sets from multiple domains are available, including data sets that profile different types of genomic features and data sets that profile the same type of genomic features across different species. These data sets typically have different powers in identifying the unknown cell types through clustering, and data integration can potentially lead to a better performance of clustering algorithms. In this work, we formulate the problem in an unsupervised transfer learning framework, which utilizes knowledge learned from auxiliary data set to improve the clustering performance of target data set. The degree of shared information among the target and auxiliary data sets can vary, and their distributions can also be different. To address these challenges, we propose an elastic coupled co-clustering based transfer learning algorithm, by elastically propagating clustering knowledge obtained from the auxiliary data set to the target data set. Implementation on single-cell genomic data sets shows that our algorithm greatly improves clustering performance over the traditional learning algorithms. The source code and data sets are available at https://github.com/cuhklinlab/elasticC3.
Tasks	Transfer Learning
Published	2020-03-29
URL	https://arxiv.org/abs/2003.12970v1
PDF	https://arxiv.org/pdf/2003.12970v1.pdf
PWC	https://paperswithcode.com/paper/elastic-coupled-co-clustering-for-single-cell
Repo	https://github.com/cuhklinlab/elasticC3
Framework	none