Paper Group AWR 216
Indices Matter: Learning to Index for Deep Image Matting. Collaborative Policy Learning for Open Knowledge Graph Reasoning. Unsupervised Feature Learning for Point Cloud by Contrasting and Clustering With Graph Convolutional Neural Network. Cross-X Learning for Fine-Grained Visual Categorization. ZEN: Pre-training Chinese Text Encoder Enhanced by N …
Indices Matter: Learning to Index for Deep Image Matting
Title | Indices Matter: Learning to Index for Deep Image Matting |
Authors | Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu |
Abstract | We show that existing upsampling operators can be unified with the notion of the index function. This notion is inspired by an observation in the decoding process of deep image matting where indices-guided unpooling can recover boundary details much better than other upsampling operators such as bilinear interpolation. By looking at the indices as a function of the feature map, we introduce the concept of learning to index, and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the pooling and upsampling operators, without the need of supervision. At the core of this framework is a flexible network module, termed IndexNet, which dynamically predicts indices given an input. Due to its flexibility, IndexNet can be used as a plug-in applying to any off-the-shelf convolutional networks that have coupled downsampling and upsampling stages. We demonstrate the effectiveness of IndexNet on the task of natural image matting where the quality of learned indices can be visually observed from predicted alpha mattes. Results on the Composition-1k matting dataset show that our model built on MobileNetv2 exhibits at least $16.1%$ improvement over the seminal VGG-16 based deep matting baseline, with less training data and lower model capacity. Code and models has been made available at: https://tinyurl.com/IndexNetV1 |
Tasks | Image Matting |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00672v1 |
https://arxiv.org/pdf/1908.00672v1.pdf | |
PWC | https://paperswithcode.com/paper/indices-matter-learning-to-index-for-deep |
Repo | https://github.com/poppinace/indexnet_matting |
Framework | pytorch |
Collaborative Policy Learning for Open Knowledge Graph Reasoning
Title | Collaborative Policy Learning for Open Knowledge Graph Reasoning |
Authors | Cong Fu, Tong Chen, Meng Qu, Woojeong Jin, Xiang Ren |
Abstract | In recent years, there has been a surge of interests in interpretable graph reasoning methods. However, these models often suffer from limited performance when working on sparse and incomplete graphs, due to the lack of evidential paths that can reach target entities. Here we study open knowledge graph reasoning—a task that aims to reason for missing facts over a graph augmented by a background text corpus. A key challenge of the task is to filter out “irrelevant” facts extracted from corpus, in order to maintain an effective search space during path inference. We propose a novel reinforcement learning framework to train two collaborative agents jointly, i.e., a multi-hop graph reasoner and a fact extractor. The fact extraction agent generates fact triples from corpora to enrich the graph on the fly; while the reasoning agent provides feedback to the fact extractor and guides it towards promoting facts that are helpful for the interpretable reasoning. Experiments on two public datasets demonstrate the effectiveness of the proposed approach. Source code and datasets used in this paper can be downloaded at https://github.com/shanzhenren/CPL |
Tasks | |
Published | 2019-08-31 |
URL | https://arxiv.org/abs/1909.00230v1 |
https://arxiv.org/pdf/1909.00230v1.pdf | |
PWC | https://paperswithcode.com/paper/collaborative-policy-learning-for-open |
Repo | https://github.com/INK-USC/CPL |
Framework | tf |
Unsupervised Feature Learning for Point Cloud by Contrasting and Clustering With Graph Convolutional Neural Network
Title | Unsupervised Feature Learning for Point Cloud by Contrasting and Clustering With Graph Convolutional Neural Network |
Authors | Ling Zhang, Zhigang Zhu |
Abstract | To alleviate the cost of collecting and annotating large-scale point cloud datasets, we propose an unsupervised learning approach to learn features from unlabeled point cloud “3D object” dataset by using part contrasting and object clustering with deep graph neural networks (GNNs). In the contrast learning step, all the samples in the 3D object dataset are cut into two parts and put into a “part” dataset. Then a contrast learning GNN (ContrastNet) is trained to verify whether two randomly sampled parts from the part dataset belong to the same object. In the cluster learning step, the trained ContrastNet is applied to all the samples in the original 3D object dataset to extract features, which are used to group the samples into clusters. Then another GNN for clustering learning (ClusterNet) is trained to predict the cluster ID of all the training samples. The contrasting learning forces the ContrastNet to learn high-level semantic features of objects but probably ignores low-level features, while the ClusterNet improves the quality of learned features by being trained to discover objects that probably belong to the same semantic categories by the use of cluster IDs. We have conducted extensive experiments to evaluate the proposed framework on point cloud classification tasks. The proposed unsupervised learning approach obtained comparable performance to the state-of-the-art unsupervised learning methods that used much more complicated network structures. The code of this work is publicly available via: https://github.com/lingzhang1/ContrastNet. |
Tasks | |
Published | 2019-04-28 |
URL | https://arxiv.org/abs/1904.12359v3 |
https://arxiv.org/pdf/1904.12359v3.pdf | |
PWC | https://paperswithcode.com/paper/190412359 |
Repo | https://github.com/lingzhang1/ContrastNet |
Framework | tf |
Cross-X Learning for Fine-Grained Visual Categorization
Title | Cross-X Learning for Fine-Grained Visual Categorization |
Authors | Wei Luo, Xitong Yang, Xianjie Mo, Yuheng Lu, Larry S. Davis, Jun Li, Jian Yang, Ser-Nam Lim |
Abstract | Recognizing objects from subcategories with very subtle differences remains a challenging task due to the large intra-class and small inter-class variation. Recent work tackles this problem in a weakly-supervised manner: object parts are first detected and the corresponding part-specific features are extracted for fine-grained classification. However, these methods typically treat the part-specific features of each image in isolation while neglecting their relationships between different images. In this paper, we propose Cross-X learning, a simple yet effective approach that exploits the relationships between different images and between different network layers for robust multi-scale feature learning. Our approach involves two novel components: (i) a cross-category cross-semantic regularizer that guides the extracted features to represent semantic parts and, (ii) a cross-layer regularizer that improves the robustness of multi-scale features by matching the prediction distribution across multiple layers. Our approach can be easily trained end-to-end and is scalable to large datasets like NABirds. We empirically analyze the contributions of different components of our approach and demonstrate its robustness, effectiveness and state-of-the-art performance on five benchmark datasets. Code is available at \url{https://github.com/cswluo/CrossX}. |
Tasks | Fine-Grained Image Classification, Fine-Grained Visual Categorization |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04412v1 |
https://arxiv.org/pdf/1909.04412v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-x-learning-for-fine-grained-visual |
Repo | https://github.com/cswluo/CrossX |
Framework | pytorch |
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations
Title | ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations |
Authors | Shizhe Diao, Jiaxin Bai, Yan Song, Tong Zhang, Yonggang Wang |
Abstract | The pre-training of text encoders normally processes text as a sequence of tokens corresponding to small text units, such as word pieces in English and characters in Chinese. It omits information carried by larger text granularity, and thus the encoders cannot easily adapt to certain combinations of characters. This leads to a loss of important semantic information, which is especially problematic for Chinese because the language does not have explicit word boundaries. In this paper, we propose ZEN, a BERT-based Chinese (Z) text encoder Enhanced by N-gram representations, where different combinations of characters are considered during training. As a result, potential word or phase boundaries are explicitly pre-trained and fine-tuned with the character encoder (BERT). Therefore ZEN incorporates the comprehensive information of both the character sequence and words or phrases it contains. Experimental results illustrated the effectiveness of ZEN on a series of Chinese NLP tasks. We show that ZEN, using less resource than other published encoders, can achieve state-of-the-art performance on most tasks. Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data. The code and pre-trained models of ZEN are available at https://github.com/sinovation/zen. |
Tasks | Chinese Named Entity Recognition, Chinese Word Segmentation, Document Classification, Natural Language Inference, Part-Of-Speech Tagging, Sentence Pair Modeling, Sentiment Analysis |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.00720v1 |
https://arxiv.org/pdf/1911.00720v1.pdf | |
PWC | https://paperswithcode.com/paper/zen-pre-training-chinese-text-encoder |
Repo | https://github.com/sinovation/ZEN |
Framework | pytorch |
RiWalk: Fast Structural Node Embedding via Role Identification
Title | RiWalk: Fast Structural Node Embedding via Role Identification |
Authors | Xuewei Ma, Geng Qin, Zhiyang Qiu, Mingxin Zheng, Zhe Wang |
Abstract | Nodes performing different functions in a network have different roles, and these roles can be gleaned from the structure of the network. Learning latent representations for the roles of nodes helps to understand the network and to transfer knowledge across networks. However, most existing structural embedding approaches suffer from high computation and space cost or rely on heuristic feature engineering. Here we propose RiWalk, a flexible paradigm for learning structural node representations. It decouples the structural embedding problem into a role identification procedure and a network embedding procedure. Through role identification, rooted kernels with structural dependencies kept are built to better integrate network embedding methods. To demonstrate the effectiveness of RiWalk, we develop two different role identification methods named RiWalk-SP and RiWalk-WL respectively and employ random walk based network embedding methods. Experiments on within-network classification tasks show that our proposed algorithms achieve comparable performance with other baselines while being an order of magnitude more efficient. Besides, we also conduct across-network role classification tasks. The results show potential of structural embeddings in transfer learning. RiWalk is also scalable, making it capable of capturing structural roles in massive networks. |
Tasks | Feature Engineering, Network Embedding, Structural Node Embedding, Transfer Learning |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06541v1 |
https://arxiv.org/pdf/1910.06541v1.pdf | |
PWC | https://paperswithcode.com/paper/riwalk-fast-structural-node-embedding-via |
Repo | https://github.com/maxuewei2/RiWalk |
Framework | none |
See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification
Title | See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification |
Authors | Tao Hu, Honggang Qi, Qingming Huang, Yan Lu |
Abstract | Data augmentation is usually adopted to increase the amount of training data, prevent overfitting and improve the performance of deep models. However, in practice, random data augmentation, such as random image cropping, is low-efficiency and might introduce many uncontrolled background noises. In this paper, we propose Weakly Supervised Data Augmentation Network (WS-DAN) to explore the potential of data augmentation. Specifically, for each training image, we first generate attention maps to represent the object’s discriminative parts by weakly supervised learning. Next, we augment the image guided by these attention maps, including attention cropping and attention dropping. The proposed WS-DAN improves the classification accuracy in two folds. In the first stage, images can be seen better since more discriminative parts’ features will be extracted. In the second stage, attention regions provide accurate location of object, which ensures our model to look at the object closer and further improve the performance. Comprehensive experiments in common fine-grained visual classification datasets show that our WS-DAN surpasses the state-of-the-art methods, which demonstrates its effectiveness. |
Tasks | Data Augmentation, Fine-Grained Image Classification, Image Cropping |
Published | 2019-01-26 |
URL | http://arxiv.org/abs/1901.09891v2 |
http://arxiv.org/pdf/1901.09891v2.pdf | |
PWC | https://paperswithcode.com/paper/see-better-before-looking-closer-weakly |
Repo | https://github.com/tau-yihouxiang/WS_DAN |
Framework | tf |
Scalable Evaluation and Improvement of Document Set Expansion via Neural Positive-Unlabeled Learning
Title | Scalable Evaluation and Improvement of Document Set Expansion via Neural Positive-Unlabeled Learning |
Authors | Alon Jacovi, Gang Niu, Yoav Goldberg, Masashi Sugiyama |
Abstract | We consider the situation in which a user has collected a small set of documents on a cohesive topic, and they want to retrieve additional documents on this topic from a large collection. Information Retrieval (IR) solutions treat the document set as a query, and look for similar documents in the collection. We propose to extend the IR approach by treating the problem as an instance of positive-unlabeled (PU) learning—i.e., learning binary classifiers from only positive and unlabeled data, where the positive data corresponds to the query documents, and the unlabeled data is the results returned by the IR engine. Utilizing PU learning for text with big neural networks is a largely unexplored field. We discuss various challenges in applying PU learning to the setting, including an unknown class prior, extremely imbalanced data and large-scale accurate evaluation of models, and we propose solutions and empirically validate them. We demonstrate the effectiveness of the method using a series of experiments of retrieving PubMed abstracts adhering to fine-grained topics. We demonstrate improvements over the base IR solution and other baselines. Implementation is available at https://github.com/sayaendo/document-set-expansion-pu. |
Tasks | Information Retrieval |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13339v1 |
https://arxiv.org/pdf/1910.13339v1.pdf | |
PWC | https://paperswithcode.com/paper/191013339 |
Repo | https://github.com/sayaendo/document-set-expansion-pu |
Framework | none |
Does the “Artificial Intelligence Clinician” learn optimal treatment strategies for sepsis in intensive care?
Title | Does the “Artificial Intelligence Clinician” learn optimal treatment strategies for sepsis in intensive care? |
Authors | Russell Jeter, Christopher Josef, Supreeth Shashikumar, Shamim Nemati |
Abstract | From 2017 to 2018 the number of scientific publications found via PubMed search using the keyword “Machine Learning” increased by 46% (4,317 to 6,307). The results of studies involving machine learning, artificial intelligence (AI), and big data have captured the attention of healthcare practitioners, healthcare managers, and the public at a time when Western medicine grapples with unmitigated cost increases and public demands for accountability. The complexity involved in healthcare applications of machine learning and the size of the associated data sets has afforded many researchers an uncontested opportunity to satisfy these demands with relatively little oversight. In a recent Nature Medicine article, “The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care,” Komorowski and his coauthors propose methods to train an artificial intelligence clinician to treat sepsis patients with vasopressors and IV fluids. In this post, we will closely examine the claims laid out in this paper. In particular, we will study the individual treatment profiles suggested by their AI Clinician to gain insight into how their AI Clinician intends to treat patients on an individual level. |
Tasks | |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03271v1 |
http://arxiv.org/pdf/1902.03271v1.pdf | |
PWC | https://paperswithcode.com/paper/does-the-artificial-intelligence-clinician |
Repo | https://github.com/point85AI/Policy-Iteration-AI-Clinician |
Framework | none |
Learning Across Tasks and Domains
Title | Learning Across Tasks and Domains |
Authors | Pierluigi Zama Ramirez, Alessio Tonioni, Samuele Salti, Luigi Di Stefano |
Abstract | Recent works have proven that many relevant visual tasks are closely related one to another. Yet, this connection is seldom deployed in practice due to the lack of practical methodologies to transfer learned concepts across different training processes. In this work, we introduce a novel adaptation framework that can operate across both task and domains. Our framework learns to transfer knowledge across tasks in a fully supervised domain (e.g., synthetic data) and use this knowledge on a different domain where we have only partial supervision (e.g., real data). Our proposal is complementary to existing domain adaptation techniques and extends them to cross tasks scenarios providing additional performance gains. We prove the effectiveness of our framework across two challenging tasks (i.e., monocular depth estimation and semantic segmentation) and four different domains (Synthia, Carla, Kitti, and Cityscapes). |
Tasks | Depth Estimation, Domain Adaptation, Monocular Depth Estimation, Semantic Segmentation |
Published | 2019-04-09 |
URL | https://arxiv.org/abs/1904.04744v2 |
https://arxiv.org/pdf/1904.04744v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-across-tasks-and-domains |
Repo | https://github.com/CVLAB-Unibo/ATDT |
Framework | tf |
Central Similarity Quantization for Efficient Image and Video Retrieval
Title | Central Similarity Quantization for Efficient Image and Video Retrieval |
Authors | Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng |
Abstract | Existing data-dependent hashing methods usually learn hash functions from pairwise or triplet data relationships, which only capture the data similarity locally, and often suffer from low learning efficiency and low collision rate. In this work, we propose a new \emph{global} similarity metric, termed as \emph{central similarity}, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy. We principally formulate the computation of the proposed central similarity metric by introducing a new concept, i.e., \emph{hash center} that refers to a set of data points scattered in the Hamming space with a sufficient mutual distance between each other. We then provide an efficient method to construct well separated hash centers by leveraging the Hadamard matrix and Bernoulli distributions. Finally, we propose the Central Similarity Quantization (CSQ) that optimizes the central similarity between data points w.r.t.\ their hash centers instead of optimizing the local similarity. CSQ is generic and applicable to both image and video hashing scenarios. Extensive experiments on large-scale image and video retrieval tasks demonstrate that CSQ can generate cohesive hash codes for similar data pairs and dispersed hash codes for dissimilar pairs, achieving a noticeable boost in retrieval performance, i.e. 3%-20% in mAP over the previous state-of-the-arts. The code is at: \url{https://github.com/yuanli2333/Hadamard-Matrix-for-hashing} |
Tasks | Quantization, Video Retrieval |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00347v5 |
https://arxiv.org/pdf/1908.00347v5.pdf | |
PWC | https://paperswithcode.com/paper/central-similarity-hashing-via-hadamard |
Repo | https://github.com/yuanli2333/Hadamard-Matrix-for-hashing |
Framework | pytorch |
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Title | DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs |
Authors | Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner |
Abstract | Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literature on this dataset and show that the best systems only achieve 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.0%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 47.0% F1. |
Tasks | Reading Comprehension, Semantic Parsing |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00161v2 |
http://arxiv.org/pdf/1903.00161v2.pdf | |
PWC | https://paperswithcode.com/paper/drop-a-reading-comprehension-benchmark |
Repo | https://github.com/m3yrin/naqanet_notebook |
Framework | none |
Table2answer: Read the database and answer without SQL
Title | Table2answer: Read the database and answer without SQL |
Authors | Tong Guo, Huilin Gao |
Abstract | Semantic parsing is the task of mapping natural language to logic form. In question answering, semantic parsing can be used to map the question to logic form and execute the logic form to get the answer. One key problem for semantic parsing is the hard label work. We study this problem in another way: we do not use the logic form any more. Instead we only use the schema and answer info. We think that the logic form step can be injected into the deep model. The reason why we think removing the logic form step is possible is that human can do the task without explicit logic form. We use BERT-based model and do the experiment in the WikiSQL dataset, which is a large natural language to SQL dataset. Our experimental evaluations that show that our model can achieves the baseline results in WikiSQL dataset. |
Tasks | Question Answering, Semantic Parsing |
Published | 2019-02-12 |
URL | https://arxiv.org/abs/1902.04260v8 |
https://arxiv.org/pdf/1902.04260v8.pdf | |
PWC | https://paperswithcode.com/paper/table2answer-read-the-database-and-answer |
Repo | https://github.com/guotong1988/table2answer |
Framework | tf |
On The Power of Curriculum Learning in Training Deep Networks
Title | On The Power of Curriculum Learning in Training Deep Networks |
Authors | Guy Hacohen, Daphna Weinshall |
Abstract | Training neural networks is traditionally done by providing a sequence of random mini-batches sampled uniformly from the entire training data. In this work, we analyze the effect of curriculum learning, which involves the non-uniform sampling of mini-batches, on the training of deep networks, and specifically CNNs trained for image recognition. To employ curriculum learning, the training algorithm must resolve 2 problems: (i) sort the training examples by difficulty; (ii) compute a series of mini-batches that exhibit an increasing level of difficulty. We address challenge (i) using two methods: transfer learning from some competitive ``teacher” network, and bootstrapping. In our empirical evaluation, both methods show similar benefits in terms of increased learning speed and improved final performance on test data. We address challenge (ii) by investigating different pacing functions to guide the sampling. The empirical investigation includes a variety of network architectures, using images from CIFAR-10, CIFAR-100 and subsets of ImageNet. We conclude with a novel theoretical analysis of curriculum learning, where we show how it effectively modifies the optimization landscape. We then define the concept of an ideal curriculum, and show that under mild conditions it does not change the corresponding global minimum of the optimization function. | |
Tasks | Transfer Learning |
Published | 2019-04-07 |
URL | https://arxiv.org/abs/1904.03626v3 |
https://arxiv.org/pdf/1904.03626v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-power-of-curriculum-learning-in |
Repo | https://github.com/GuyHacohen/curriculum_learning |
Framework | none |
Deformable Kernel Networks for Joint Image Filtering
Title | Deformable Kernel Networks for Joint Image Filtering |
Authors | Beomjun Kim, Jean Ponce, Bumsub Ham |
Abstract | Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size 640 x 480. We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled 3 x 3 kernels outperforms the state of the art by a significant margin in all cases. |
Tasks | Image Restoration, Semantic Segmentation |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.08373v1 |
https://arxiv.org/pdf/1910.08373v1.pdf | |
PWC | https://paperswithcode.com/paper/deformable-kernel-networks-for-joint-image |
Repo | https://github.com/jun0kim/dkn |
Framework | pytorch |