Paper Group AWR 15
Graph Structured Network for Image-Text Matching. High-Performance Long-Term Tracking with Meta-Updater. Pose-guided Visible Part Matching for Occluded Person ReID. Ontology-based Interpretable Machine Learning for Textual Data. MetaPoison: Practical General-purpose Clean-label Data Poisoning. Semantic Drift Compensation for Class-Incremental Learn …
Graph Structured Network for Image-Text Matching
Title | Graph Structured Network for Image-Text Matching |
Authors | Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang |
Abstract | Image-text matching has received growing interest since it bridges vision and language. The key challenge lies in how to learn correspondence between image and text. Existing works learn coarse correspondence based on object co-occurrence statistics, while failing to learn fine-grained phrase correspondence. In this paper, we present a novel Graph Structured Matching Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase. This is achieved by node-level matching and structure-level matching. The node-level matching associates each node with its relevant nodes from another modality, where the node can be object, relation or attribute. The associated nodes then jointly infer fine-grained correspondence by fusing neighborhood associations at structure-level matching. Comprehensive experiments show that GSMN outperforms state-of-the-art methods on benchmarks, with relative Recall@1 improvements of nearly 7% and 2% on Flickr30K and MSCOCO, respectively. Code will be released at: https://github.com/CrossmodalGroup/GSMN. |
Tasks | Text Matching |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00277v1 |
https://arxiv.org/pdf/2004.00277v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-structured-network-for-image-text |
Repo | https://github.com/CrossmodalGroup/GSMN |
Framework | none |
High-Performance Long-Term Tracking with Meta-Updater
Title | High-Performance Long-Term Tracking with Meta-Updater |
Authors | Kenan Dai, Yunhua Zhang, Dong Wang, Jianhua Li, Huchuan Lu, Xiaoyun Yang |
Abstract | Long-term visual tracking has drawn increasing attention because it is much closer to practical applications than short-term tracking. Most top-ranked long-term trackers adopt the offline-trained Siamese architectures, thus, they cannot benefit from great progress of short-term trackers with online update. However, it is quite risky to straightforwardly introduce online-update-based trackers to solve the long-term problem, due to long-term uncertain and noisy observations. In this work, we propose a novel offline-trained Meta-Updater to address an important but unsolved problem: Is the tracker ready for updating in the current frame? The proposed meta-updater can effectively integrate geometric, discriminative, and appearance cues in a sequential manner, and then mine the sequential information with a designed cascaded LSTM module. Our meta-updater learns a binary output to guide the tracker’s update and can be easily embedded into different trackers. This work also introduces a long-term tracking framework consisting of an online local tracker, an online verifier, a SiamRPN-based re-detector, and our meta-updater. Numerous experimental results on the VOT2018LT, VOT2019LT, OxUvALT, TLP, and LaSOT benchmarks show that our tracker performs remarkably better than other competing algorithms. Our project is available on the website: https://github.com/Daikenan/LTMU. |
Tasks | Visual Tracking |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00305v1 |
https://arxiv.org/pdf/2004.00305v1.pdf | |
PWC | https://paperswithcode.com/paper/high-performance-long-term-tracking-with-meta |
Repo | https://github.com/Daikenan/LTMU |
Framework | none |
Pose-guided Visible Part Matching for Occluded Person ReID
Title | Pose-guided Visible Part Matching for Occluded Person ReID |
Authors | Shang Gao, Jingya Wang, Huchuan Lu, Zimo Liu |
Abstract | Occluded person re-identification is a challenging task as the appearance varies substantially with various obstacles, especially in the crowd scenario. To address this issue, we propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility in an end-to-end framework. Specifically, the proposed PVPM includes two key components: 1) pose-guided attention (PGA) method for part feature pooling that exploits more discriminative local features; 2) pose-guided visibility predictor (PVP) that estimates whether a part suffers the occlusion or not. As there are no ground truth training annotations for the occluded part, we turn to utilize the characteristic of part correspondence in positive pairs and self-mining the correspondence scores via graph matching. The generated correspondence scores are then utilized as pseudo-labels for visibility predictor (PVP). Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods. The source codes are available at https://github.com/hh23333/PVPM |
Tasks | Graph Matching, Person Re-Identification |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00230v1 |
https://arxiv.org/pdf/2004.00230v1.pdf | |
PWC | https://paperswithcode.com/paper/pose-guided-visible-part-matching-for |
Repo | https://github.com/hh23333/PVPM |
Framework | none |
Ontology-based Interpretable Machine Learning for Textual Data
Title | Ontology-based Interpretable Machine Learning for Textual Data |
Authors | Phung Lai, NhatHai Phan, Han Hu, Anuja Badeti, David Newman, Dejing Dou |
Abstract | In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers contextual correlation among words, described in domain knowledge ontologies, to generate semantic explanations. To narrow down the search space for explanations, which is a major problem of long and complicated text data, we design a learnable anchor algorithm, to better extract explanations locally. A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible semantic explanations. An extensive experiment conducted on two real-world datasets shows that our approach generates more precise and insightful explanations compared with baseline approaches. |
Tasks | Interpretable Machine Learning |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00204v1 |
https://arxiv.org/pdf/2004.00204v1.pdf | |
PWC | https://paperswithcode.com/paper/ontology-based-interpretable-machine-learning |
Repo | https://github.com/PhungLai728/OnML |
Framework | none |
MetaPoison: Practical General-purpose Clean-label Data Poisoning
Title | MetaPoison: Practical General-purpose Clean-label Data Poisoning |
Authors | W. Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, Tom Goldstein |
Abstract | Data poisoning–the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data–is an emerging threat in the context of neural networks. Existing attacks for data poisoning have relied on hand-crafted heuristics. Instead, we pose crafting poisons more generally as a bi-level optimization problem, where the inner level corresponds to training a network on a poisoned dataset and the outer level corresponds to updating those poisons to achieve a desired behavior on the trained model. We then propose MetaPoison, a first-order method to solve this optimization quickly. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin under the same setting. MetaPoison is robust: its poisons transfer to a variety of victims with unknown hyperparameters and architectures. MetaPoison is also general-purpose, working not only in fine-tuning scenarios, but also for end-to-end training from scratch with remarkable success, e.g. causing a target image to be misclassified 90% of the time via manipulating just 1% of the dataset. Additionally, MetaPoison can achieve arbitrary adversary goals not previously possible–like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate successful data poisoning of models trained on Google Cloud AutoML Vision. Code and premade poisons are provided at https://github.com/wronnyhuang/metapoison |
Tasks | AutoML, data poisoning |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00225v1 |
https://arxiv.org/pdf/2004.00225v1.pdf | |
PWC | https://paperswithcode.com/paper/metapoison-practical-general-purpose-clean |
Repo | https://github.com/wronnyhuang/metapoison |
Framework | none |
Semantic Drift Compensation for Class-Incremental Learning
Title | Semantic Drift Compensation for Class-Incremental Learning |
Authors | Lu Yu, Bartłomiej Twardowski, Xialei Liu, Luis Herranz, Kai Wang, Yongmei Cheng, Shangling Jui, Joost van de Weijer |
Abstract | Class-incremental learning of deep networks sequentially increases the number of classes to be classified. During training, the network has only access to data of one task at a time, where each task contains several classes. In this setting, networks suffer from catastrophic forgetting which refers to the drastic drop in performance on previous tasks. The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes. Embedding networks have the advantage that new classes can be naturally included into the network without adding new weights. Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. We approximate the drift of previous tasks based on the drift that is experienced by current task data. We perform experiments on fine-grained datasets, CIFAR100 and ImageNet-Subset. We demonstrate that embedding networks suffer significantly less from catastrophic forgetting. We outperform existing methods which do not require exemplars and obtain competitive results compared to methods which store exemplars. Furthermore, we show that our proposed SDC when combined with existing methods to prevent forgetting consistently improves results. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00440v1 |
https://arxiv.org/pdf/2004.00440v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-drift-compensation-for-class |
Repo | https://github.com/yulu0724/SDC-IL |
Framework | none |
CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection
Title | CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection |
Authors | Zhiwei Dong, Guoxuan Li, Yue Liao, Fei Wang, Pengju Ren, Chen Qian |
Abstract | Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP. Code will be available at https://github.com/KiveeDong/CentripetalNet. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09119v1 |
https://arxiv.org/pdf/2003.09119v1.pdf | |
PWC | https://paperswithcode.com/paper/centripetalnet-pursuing-high-quality-keypoint |
Repo | https://github.com/KiveeDong/CentripetalNet |
Framework | pytorch |
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
Title | Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation |
Authors | Matteo Fabbri, Fabio Lanzi, Simone Calderara, Stefano Alletto, Rita Cucchiara |
Abstract | In this paper we present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images. We propose to use high resolution volumetric heatmaps to model joint locations, devising a simple and effective compression method to drastically reduce the size of this representation. At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation. A second model, the Code Predictor, is then trained to predict these codes, which can be decompressed at test time to re-obtain the original representation. Our experimental evaluation shows that our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets and, thanks to our novel compression strategy, can process full-HD images at the constant runtime of 8 fps regardless of the number of subjects in the scene. Code and models available at https://github.com/fabbrimatteo/LoCO . |
Tasks | 3D Human Pose Estimation, 3D Pose Estimation, Pose Estimation |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00329v1 |
https://arxiv.org/pdf/2004.00329v1.pdf | |
PWC | https://paperswithcode.com/paper/compressed-volumetric-heatmaps-for-multi |
Repo | https://github.com/fabbrimatteo/LoCO |
Framework | none |
Symmetry and Group in Attribute-Object Compositions
Title | Symmetry and Group in Attribute-Object Compositions |
Authors | Yong-Lu Li, Yue Xu, Xiaohan Mao, Cewu Lu |
Abstract | Attributes and objects can compose diverse compositions. To model the compositional nature of these general concepts, it is a good choice to learn them through transformations, such as coupling and decoupling. However, complex transformations need to satisfy specific principles to guarantee the rationality. In this paper, we first propose a previously ignored principle of attribute-object transformation: Symmetry. For example, coupling peeled-apple with attribute peeled should result in peeled-apple, and decoupling peeled from apple should still output apple. Incorporating the symmetry principle, a transformation framework inspired by group theory is built, i.e. SymNet. SymNet consists of two modules, Coupling Network and Decoupling Network. With the group axioms and symmetry property as objectives, we adopt Deep Neural Networks to implement SymNet and train it in an end-to-end paradigm. Moreover, we propose a Relative Moving Distance (RMD) based recognition method to utilize the attribute change instead of the attribute pattern itself to classify attributes. Our symmetry learning can be utilized for the Compositional Zero-Shot Learning task and outperforms the state-of-the-art on widely-used benchmarks. Code is available at https://github.com/DirtyHarryLYL/SymNet. |
Tasks | Zero-Shot Learning |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00587v1 |
https://arxiv.org/pdf/2004.00587v1.pdf | |
PWC | https://paperswithcode.com/paper/symmetry-and-group-in-attribute-object |
Repo | https://github.com/DirtyHarryLYL/SymNet |
Framework | none |
Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis
Title | Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis |
Authors | Hao Tang, Xiaojuan Qi, Dan Xu, Philip H. S. Torr, Nicu Sebe |
Abstract | We propose a novel Edge guided Generative Adversarial Network (EdgeGAN) for photo-realistic image synthesis from semantic layouts. Although considerable improvement has been achieved, the quality of synthesized images is far from satisfactory due to two largely unresolved challenges. First, the semantic labels do not provide detailed structural information, making it difficult to synthesize local details and structures. Second, the widely adopted CNN operations such as convolution, down-sampling and normalization usually cause spatial resolution loss and thus are unable to fully preserve the original semantic information, leading to semantically inconsistent results (e.g., missing small objects). To tackle the first challenge, we propose to use the edge as an intermediate representation which is further adopted to guide image generation via a proposed attention guided edge transfer module. Edge information is produced by a convolutional generator and introduces detailed structure information. Further, to preserve the semantic information, we design an effective module to selectively highlight class-dependent feature maps according to the original semantic layout. Extensive experiments on two challenging datasets show that the proposed EdgeGAN can generate significantly better results than state-of-the-art methods. The source code and trained models are available at https://github.com/Ha0Tang/EdgeGAN. |
Tasks | Image Generation |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.13898v1 |
https://arxiv.org/pdf/2003.13898v1.pdf | |
PWC | https://paperswithcode.com/paper/edge-guided-gans-with-semantic-preserving-for |
Repo | https://github.com/Ha0Tang/EdgeGAN |
Framework | none |
AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion
Title | AandP: Utilizing Prolog for converting between active sentence and passive sentence with three-steps conversion |
Authors | Trung Q. Tran |
Abstract | I introduce a simple but efficient method to solve one of the critical aspects of English grammar which is the relationship between active sentence and passive sentence. In fact, an active sentence and its corresponding passive sentence express the same meaning, but their structure is different. I utilized Prolog [4] along with Definite Clause Grammars (DCG) [5] for doing the conversion between active sentence and passive sentence. Some advanced techniques were also used such as Extra Arguments, Extra Goals, Lexicon, etc. I tried to solve a variety of cases of active and passive sentences such as 12 English tenses, modal verbs, negative form, etc. More details and my contributions will be presented in the following sections. The source code is available at https://github.com/tqtrunghnvn/ActiveAndPassive. |
Tasks | |
Published | 2020-01-16 |
URL | https://arxiv.org/abs/2001.05672v1 |
https://arxiv.org/pdf/2001.05672v1.pdf | |
PWC | https://paperswithcode.com/paper/aandp-utilizing-prolog-for-converting-between |
Repo | https://github.com/tqtrunghnvn/ActiveAndPassive |
Framework | none |
Deep Snake for Real-Time Instance Segmentation
Title | Deep Snake for Real-Time Instance Segmentation |
Authors | Sida Peng, Wen Jiang, Huaijin Pi, Xiuli Li, Hujun Bao, Xiaowei Zhou |
Abstract | This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. Unlike some recent methods that directly regress the coordinates of the object boundary points from an image, deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach. For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution. Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization. Experiments show that the proposed approach achieves competitive performances on the Cityscapes, KINS, SBD and COCO datasets while being efficient for real-time applications with a speed of 32.3 fps for 512$\times$512 images on a 1080Ti GPU. The code is available at https://github.com/zju3dv/snake/. |
Tasks | Instance Segmentation, Object Localization, Real-time Instance Segmentation, Semantic Segmentation |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01629v3 |
https://arxiv.org/pdf/2001.01629v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-snake-for-real-time-instance |
Repo | https://github.com/ShanghaiTechCVDL/Weekly_Group_Meeting_Paper_List |
Framework | none |
Fixing the train-test resolution discrepancy: FixEfficientNet
Title | Fixing the train-test resolution discrepancy: FixEfficientNet |
Authors | Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou |
Abstract | This note complements the paper “Fixing the train-test resolution discrepancy” that introduced the FixRes method. First, we show that this strategy is advantageously combined with recent training recipes from the literature. Most importantly, we provide new results for the EfficientNet architecture. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained without additional training data achieves 79.3% top-1 accuracy on ImageNet with 5.3M parameters. This is a +0.5% absolute improvement over the Noisy student EfficientNet-B0 trained with 300M unlabeled images and +1.7% compared to the EfficientNet-B0 trained with adversarial examples. An EfficientNet-L2 pre-trained with weak supervision on 300M unlabeled images and further optimized with FixRes achieves 88.5% top-1 accuracy (top-5: 98.7%), which establishes the new state of the art for ImageNet with a single crop. |
Tasks | Data Augmentation, Image Classification |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08237v3 |
https://arxiv.org/pdf/2003.08237v3.pdf | |
PWC | https://paperswithcode.com/paper/fixing-the-train-test-resolution-discrepancy-2 |
Repo | https://github.com/facebookresearch/FixRes |
Framework | pytorch |
Extreme Algorithm Selection With Dyadic Feature Representation
Title | Extreme Algorithm Selection With Dyadic Feature Representation |
Authors | Alexander Tornede, Marcel Wever, Eyke Hüllermeier |
Abstract | Algorithm selection (AS) deals with selecting an algorithm from a fixed set of candidate algorithms most suitable for a specific instance of an algorithmic problem, e.g., choosing solvers for SAT problems. Benchmark suites for AS usually comprise candidate sets consisting of at most tens of algorithms, whereas in combined algorithm selection and hyperparameter optimization problems the number of candidates becomes intractable, impeding to learn effective meta-models and thus requiring costly online performance evaluations. Therefore, here we propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms, facilitating meta learning. We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation in which both problem instances and algorithms are described. We find the latter to improve significantly over the current state of the art in various metrics. |
Tasks | Hyperparameter Optimization, Meta-Learning |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.10741v1 |
https://arxiv.org/pdf/2001.10741v1.pdf | |
PWC | https://paperswithcode.com/paper/extreme-algorithm-selection-with-dyadic |
Repo | https://github.com/alexandertornede/extreme_algorithm_selection |
Framework | none |
Elastic Coupled Co-clustering for Single-Cell Genomic Data
Title | Elastic Coupled Co-clustering for Single-Cell Genomic Data |
Authors | Pengcheng Zeng, Zhixiang Lin |
Abstract | The recent advances in single-cell technologies have enabled us to profile genomic features at unprecedented resolution and data sets from multiple domains are available, including data sets that profile different types of genomic features and data sets that profile the same type of genomic features across different species. These data sets typically have different powers in identifying the unknown cell types through clustering, and data integration can potentially lead to a better performance of clustering algorithms. In this work, we formulate the problem in an unsupervised transfer learning framework, which utilizes knowledge learned from auxiliary data set to improve the clustering performance of target data set. The degree of shared information among the target and auxiliary data sets can vary, and their distributions can also be different. To address these challenges, we propose an elastic coupled co-clustering based transfer learning algorithm, by elastically propagating clustering knowledge obtained from the auxiliary data set to the target data set. Implementation on single-cell genomic data sets shows that our algorithm greatly improves clustering performance over the traditional learning algorithms. The source code and data sets are available at https://github.com/cuhklinlab/elasticC3. |
Tasks | Transfer Learning |
Published | 2020-03-29 |
URL | https://arxiv.org/abs/2003.12970v1 |
https://arxiv.org/pdf/2003.12970v1.pdf | |
PWC | https://paperswithcode.com/paper/elastic-coupled-co-clustering-for-single-cell |
Repo | https://github.com/cuhklinlab/elasticC3 |
Framework | none |