April 3, 2020

3501 words 17 mins read

Paper Group AWR 39

Paper Group AWR 39

A Dataset of German Legal Documents for Named Entity Recognition. Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation. Listwise Learning to Rank by Exploring Unique Ratings. VerSe: A Vertebrae Labelling and Segmentation Benchmark. Probabilistic Regression for Visual Tracking. Instance Credibili …

Title A Dataset of German Legal Documents for Named Entity Recognition
Authors Elena Leitner, Georg Rehm, Julián Moreno-Schneider
Abstract We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
Tasks Named Entity Recognition
Published 2020-03-29
URL https://arxiv.org/abs/2003.13016v1
PDF https://arxiv.org/pdf/2003.13016v1.pdf
PWC https://paperswithcode.com/paper/a-dataset-of-german-legal-documents-for-named
Repo https://github.com/elenanereiss/Legal-Entity-Recognition
Framework none

Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation

Title Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation
Authors Cheng Ma, Zhenyu Jiang, Yongming Rao, Jiwen Lu, Jie Zhou
Abstract Recent works based on deep learning and facial priors have succeeded in super-resolving severely degraded facial images. However, the prior knowledge is not fully exploited in existing methods, since facial priors such as landmark and component maps are always estimated by low-resolution or coarsely super-resolved images, which may be inaccurate and thus affect the recovery performance. In this paper, we propose a deep face super-resolution (FSR) method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively. In each recurrent step, the recovery branch utilizes the prior knowledge of landmarks to yield higher-quality images which facilitate more accurate landmark estimation in turn. Therefore, the iterative information interaction between two processes boosts the performance of each other progressively. Moreover, a new attentive fusion module is designed to strengthen the guidance of landmark maps, where facial components are generated individually and aggregated attentively for better restoration. Quantitative and qualitative experimental results show the proposed method significantly outperforms state-of-the-art FSR methods in recovering high-quality face images.
Tasks Super-Resolution
Published 2020-03-29
URL https://arxiv.org/abs/2003.13063v1
PDF https://arxiv.org/pdf/2003.13063v1.pdf
PWC https://paperswithcode.com/paper/deep-face-super-resolution-with-iterative
Repo https://github.com/Maclory/Deep-Iterative-Collaboration
Framework pytorch

Listwise Learning to Rank by Exploring Unique Ratings

Title Listwise Learning to Rank by Exploring Unique Ratings
Authors Xiaofeng Zhu, Diego Klabjan
Abstract In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can lead to imprecise permutation probabilities and inefficient training because of selecting documents one by one. (2) It does not favor documents having high relevance. (3) It has a loose assumption that sampling documents at different steps is independent. To overcome the first two limitations, we model ranking as selecting documents from a candidate set based on unique rating levels in decreasing order. The number of steps in training is determined by the number of unique rating levels. We propose a new loss function and associated four models for the entire sequence of weighted classification tasks by assigning high weights to the selected documents with high ratings for optimizing Normalized Discounted Cumulative Gain (NDCG). To overcome the final limitation, we further propose a novel and efficient way of refining prediction scores by combining an adapted Vanilla Recurrent Neural Network (RNN) model with pooling given selected documents at previous steps. We encode all of the documents already selected by an RNN model. In a single step, we rank all of the documents with the same ratings using the last cell of the RNN multiple times. We have implemented our models using three settings: neural networks, neural networks with gradient boosting, and regression trees with gradient boosting. We have conducted experiments on four public datasets. The experiments demonstrate that the models notably outperform state-of-the-art learning-to-rank models.
Tasks Learning-To-Rank
Published 2020-01-07
URL https://arxiv.org/abs/2001.01828v3
PDF https://arxiv.org/pdf/2001.01828v3.pdf
PWC https://paperswithcode.com/paper/listwise-learning-to-rank-by-exploring-unique
Repo https://github.com/XiaofengZhu/uRank_uMart
Framework tf

VerSe: A Vertebrae Labelling and Segmentation Benchmark

Title VerSe: A Vertebrae Labelling and Segmentation Benchmark
Authors Anjany Sekuboyina, Amirhossein Bayat, Malek E. Husseini, Maximilian Löffler, Markus Rempfler, Jan Kukačka, Giles Tetteh, Alexander Valentinitsch, Christian Payer, Martin Urschler, Maodong Chen, Dalong Cheng, Nikolas Lessmann, Yujin Hu, Tianfu Wang, Dong Yang, Daguang Xu, Felix Ambellan, Stefan Zachowk, Tao Jiang, Xinjun Ma, Christoph Angerman, Xin Wang, Qingyue Wei, Kevin Brown, Matthias Wolf, Alexandre Kirszenberg, Élodie Puybareauq, Björn H. Menze, Jan S. Kirschke
Abstract In this paper we report the challenge set-up and results of the Large Scale Vertebrae Segmentation Challenge (VerSe) organized in conjunction with the MICCAI 2019. The challenge consisted of two tasks, vertebrae labelling and vertebrae segmentation. For this a total of 160 multidetector CT scan cohort closely resembling clinical setting was prepared and was annotated at a voxel-level by a human-machine hybrid algorithm. In this paper we also present the annotation protocol and the algorithm that aided the medical experts in the annotation process. Eleven fully automated algorithms were benchmarked on this data with the best performing algorithm achieving a vertebrae identification rate of 95% and a Dice coefficient of 90%. VerSe’19 is an open-call challenge at its image data along with the annotations and evaluation tools will continue to be publicly accessible through its online portal.
Tasks
Published 2020-01-24
URL https://arxiv.org/abs/2001.09193v1
PDF https://arxiv.org/pdf/2001.09193v1.pdf
PWC https://paperswithcode.com/paper/verse-a-vertebrae-labelling-and-segmentation
Repo https://github.com/christianpayer/MedicalDataAugmentationTool-VerSe
Framework tf

Probabilistic Regression for Visual Tracking

Title Probabilistic Regression for Visual Tracking
Authors Martin Danelljan, Luc Van Gool, Radu Timofte
Abstract Visual tracking is fundamentally the problem of regressing the state of the target in each video frame. While significant progress has been achieved, trackers are still prone to failures and inaccuracies. It is therefore crucial to represent the uncertainty in the target estimation. Although current prominent paradigms rely on estimating a state-dependent confidence score, this value lacks a clear probabilistic interpretation, complicating its use. In this work, we therefore propose a probabilistic regression formulation and apply it to tracking. Our network predicts the conditional probability density of the target state given an input image. Crucially, our formulation is capable of modeling label noise stemming from inaccurate annotations and ambiguities in the task. The regression network is trained by minimizing the Kullback-Leibler divergence. When applied for tracking, our formulation not only allows a probabilistic representation of the output, but also substantially improves the performance. Our tracker sets a new state-of-the-art on six datasets, achieving 59.8% AUC on LaSOT and 75.8% Success on TrackingNet. The code and models are available at https://github.com/visionml/pytracking.
Tasks Visual Tracking
Published 2020-03-27
URL https://arxiv.org/abs/2003.12565v1
PDF https://arxiv.org/pdf/2003.12565v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-regression-for-visual-tracking
Repo https://github.com/visionml/pytracking
Framework pytorch

Instance Credibility Inference for Few-Shot Learning

Title Instance Credibility Inference for Few-Shot Learning
Authors Yikai Wang, Chengming Xu, Chen Liu, Li Zhang, Yanwei Fu
Abstract Few-shot learning (FSL) aims to recognize new objects with extremely limited training data for each category. Previous efforts are made by either leveraging meta-learning paradigm or novel principles in data augmentation to alleviate this extremely data-scarce problem. In contrast, this paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning. Specifically, we first train a linear classifier with the labeled few-shot examples and use it to infer the pseudo-labels for the unlabeled data. To measure the credibility of each pseudo-labeled instance, we then propose to solve another linear regression hypothesis by increasing the sparsity of the incidental parameters and rank the pseudo-labeled instances with their sparsity degree. We select the most trustworthy pseudo-labeled instances alongside the labeled examples to re-train the linear classifier. This process is iterated until all the unlabeled samples are included in the expanded training set, i.e. the pseudo-label is converged for unlabeled data pool. Extensive experiments under two few-shot settings show that our simple approach can establish new state-of-the-arts on four widely used few-shot learning benchmark datasets including miniImageNet, tieredImageNet, CIFAR-FS, and CUB. Our code is available at: https://github.com/Yikai-Wang/ICI-FSL
Tasks Data Augmentation, Few-Shot Learning, Meta-Learning
Published 2020-03-26
URL https://arxiv.org/abs/2003.11853v1
PDF https://arxiv.org/pdf/2003.11853v1.pdf
PWC https://paperswithcode.com/paper/instance-credibility-inference-for-few-shot
Repo https://github.com/Yikai-Wang/ICI-FSL
Framework none

BachGAN: High-Resolution Image Synthesis from Salient Object Layout

Title BachGAN: High-Resolution Image Synthesis from Salient Object Layout
Authors Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu
Abstract We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout. This new setting allows users to provide the layout of salient objects only (i.e., foreground bounding boxes and categories), and lets the model complete the drawing with an invented background and a matching foreground. Two main challenges spring from this new task: (i) how to generate fine-grained details and realistic textures without segmentation map input; and (ii) how to create a background and weave it seamlessly into standalone objects. To tackle this, we propose Background Hallucination Generative Adversarial Network (BachGAN), which first selects a set of segmentation maps from a large candidate pool via a background retrieval module, then encodes these candidate layouts via a background fusion module to hallucinate a suitable background for the given objects. By generating the hallucinated background representation dynamically, our model can synthesize high-resolution images with both photo-realistic foreground and integral background. Experiments on Cityscapes and ADE20K datasets demonstrate the advantage of BachGAN over existing methods, measured on both visual fidelity of generated images and visual alignment between output images and input layouts.
Tasks Image Generation
Published 2020-03-26
URL https://arxiv.org/abs/2003.11690v2
PDF https://arxiv.org/pdf/2003.11690v2.pdf
PWC https://paperswithcode.com/paper/bachgan-high-resolution-image-synthesis-from
Repo https://github.com/Cold-Winter/BachGAN
Framework pytorch

An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset

Title An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset
Authors Zhicheng Gu, Zhihao Li, Xuan Di, Rongye Shi
Abstract The Waymo Open Dataset has been released recently, providing a platform to crowdsource some fundamental challenges for automated vehicles (AVs), such as 3D detection and tracking. While~the dataset provides a large amount of high-quality and multi-source driving information, people in academia are more interested in the underlying driving policy programmed in Waymo self-driving cars, which is inaccessible due to AV manufacturers’ proprietary protection. Accordingly, academic researchers have to make various assumptions to implement AV components in their models or simulations, which may not represent the realistic interactions in real-world traffic. Thus, this paper introduces an approach to learn a long short-term memory (LSTM)-based model for imitating the behavior of Waymo’s self-driving model. The proposed model has been evaluated based on Mean Absolute Error (MAE). The experimental results show that our model outperforms several baseline models in driving action prediction. In addition, a visualization tool is presented for verifying the performance of the model.
Tasks Autonomous Driving, Self-Driving Cars
Published 2020-02-14
URL https://arxiv.org/abs/2002.05878v2
PDF https://arxiv.org/pdf/2002.05878v2.pdf
PWC https://paperswithcode.com/paper/an-lstm-based-autonomous-driving-model-using
Repo https://github.com/JdeRobot/BehaviorSuite
Framework tf

End-to-end Autonomous Driving Perception with Sequential Latent Representation Learning

Title End-to-end Autonomous Driving Perception with Sequential Latent Representation Learning
Authors Jianyu Chen, Zhuo Xu, Masayoshi Tomizuka
Abstract Current autonomous driving systems are composed of a perception system and a decision system. Both of them are divided into multiple subsystems built up with lots of human heuristics. An end-to-end approach might clean up the system and avoid huge efforts of human engineering, as well as obtain better performance with increasing data and computation resources. Compared to the decision system, the perception system is more suitable to be designed in an end-to-end framework, since it does not require online driving exploration. In this paper, we propose a novel end-to-end approach for autonomous driving perception. A latent space is introduced to capture all relevant features useful for perception, which is learned through sequential latent representation learning. The learned end-to-end perception model is able to solve the detection, tracking, localization and mapping problems altogether with only minimum human engineering efforts and without storing any maps online. The proposed method is evaluated in a realistic urban driving simulator, with both camera image and lidar point cloud as sensor inputs. The codes and videos of this work are available at our github repo and project website.
Tasks Autonomous Driving, Representation Learning
Published 2020-03-21
URL https://arxiv.org/abs/2003.12464v1
PDF https://arxiv.org/pdf/2003.12464v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-autonomous-driving-perception-with
Repo https://github.com/cjy1992/detect-loc-map
Framework tf

Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net)

Title Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net)
Authors Fan Zhang, Guisheng Zhai, Meng Li, Yizhao Liu
Abstract ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years, but it can not achieve good result to directly migrate the champions of the annual competition, to fine-grained visual categorization (FGVC) tasks. The small interclass variations and the large intraclass variations caused by the fine-grained nature makes it a challenging problem. The proposed method can be effectively localize object and useful part regions without the need of bounding-box and part annotations by attention object location module (AOLM) and attention part proposal module (APPM). The obtained object images contain both the whole structure and more details, the part images have many different scales and have more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our three-branch network structure. The model has good classification ability, good generalization and robustness for different scale object images. Our approach is end-to-end training, through the comprehensive experiments demonstrate that our approach achieves state-of-the-art results on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets.
Tasks Fine-Grained Image Classification, Fine-Grained Image Recognition, Fine-Grained Visual Categorization, Object Recognition
Published 2020-03-20
URL https://arxiv.org/abs/2003.09150v2
PDF https://arxiv.org/pdf/2003.09150v2.pdf
PWC https://paperswithcode.com/paper/three-branch-and-mutil-scale-learning-for
Repo https://github.com/ZF1044404254/TBMSL-Net
Framework pytorch

Incorporating User’s Preference into Attributed Graph Clustering

Title Incorporating User’s Preference into Attributed Graph Clustering
Authors Wei Ye, Dominik Mautz, Christian Boehm, Ambuj Singh, Claudia Plant
Abstract Graph clustering has been studied extensively on both plain graphs and attributed graphs. However, all these methods need to partition the whole graph to find cluster structures. Sometimes, based on domain knowledge, people may have information about a specific target region in the graph and only want to find a single cluster concentrated on this local region. Such a task is called local clustering. In contrast to global clustering, local clustering aims to find only one cluster that is concentrating on the given seed vertex (and also on the designated attributes for attributed graphs). Currently, very few methods can deal with this kind of task. To this end, we propose two quality measures for a local cluster: Graph Unimodality (GU) and Attribute Unimodality (AU). The former measures the homogeneity of the graph structure while the latter measures the homogeneity of the subspace that is composed of the designated attributes. We call their linear combination as Compactness. Further, we propose LOCLU to optimize the Compactness score. The local cluster detected by LOCLU concentrates on the region of interest, provides efficient information flow in the graph and exhibits a unimodal data distribution in the subspace of the designated attributes.
Tasks Graph Clustering
Published 2020-03-24
URL https://arxiv.org/abs/2003.11079v1
PDF https://arxiv.org/pdf/2003.11079v1.pdf
PWC https://paperswithcode.com/paper/incorporating-user-s-preference-into
Repo https://github.com/yeweiysh/LOCLU
Framework none

Optimistic Exploration even with a Pessimistic Initialisation

Title Optimistic Exploration even with a Pessimistic Initialisation
Authors Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson
Abstract Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use optimistic initialisation despite taking inspiration from these provably efficient tabular algorithms. In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values due to commonly used network initialisation schemes, a pessimistic initialisation. Merely initialising the network to output optimistic Q-values is not enough, since we cannot ensure that they remain optimistic for novel state-action pairs, which is crucial for exploration. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network. We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting. Our algorithm, Optimistic Pessimistically Initialised Q-Learning (OPIQ), augments the Q-value estimates of a DQN-based agent with count-derived bonuses to ensure optimism during both action selection and bootstrapping. We show that OPIQ outperforms non-optimistic DQN variants that utilise a pseudocount-based intrinsic motivation in hard exploration tasks, and that it predicts optimistic estimates for novel state-action pairs.
Tasks Efficient Exploration, Q-Learning
Published 2020-02-26
URL https://arxiv.org/abs/2002.12174v1
PDF https://arxiv.org/pdf/2002.12174v1.pdf
PWC https://paperswithcode.com/paper/optimistic-exploration-even-with-a-1
Repo https://github.com/oxwhirl/opiq
Framework pytorch

PADS: Policy-Adapted Sampling for Visual Similarity Learning

Title PADS: Policy-Adapted Sampling for Visual Similarity Learning
Authors Karsten Roth, Timo Milbich, Björn Ommer
Abstract Learning visual similarity requires to learn relations, typically between triplets of images. Albeit triplet approaches being powerful, their computational complexity mostly limits training to only a subset of all possible training triplets. Thus, sampling strategies that decide when to use which training sample during learning are crucial. Currently, the prominent paradigm are fixed or curriculum sampling strategies that are predefined before training starts. However, the problem truly calls for a sampling process that adjusts based on the actual state of the similarity representation during training. We, therefore, employ reinforcement learning and have a teacher network adjust the sampling distribution based on the current state of the learner network, which represents visual similarity. Experiments on benchmark datasets using standard triplet-based losses show that our adaptive sampling strategy significantly outperforms fixed sampling strategies. Moreover, although our adaptive sampling is only applied on top of basic triplet-learning frameworks, we reach competitive results to state-of-the-art approaches that employ diverse additional learning signals or strong ensemble architectures. Code can be found under https://github.com/Confusezius/CVPR2020_PADS.
Tasks
Published 2020-03-24
URL https://arxiv.org/abs/2003.11113v2
PDF https://arxiv.org/pdf/2003.11113v2.pdf
PWC https://paperswithcode.com/paper/pads-policy-adapted-sampling-for-visual
Repo https://github.com/Confusezius/CVPR2020_PADS
Framework pytorch

Multi-Scale Progressive Fusion Network for Single Image Deraining

Title Multi-Scale Progressive Fusion Network for Single Image Deraining
Authors Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Baojin Huang, Yimin Luo, Jiayi Ma, Junjun Jiang
Abstract Rain streaks in the air appear in various blurring degrees and resolutions due to different distances from their positions to the camera. Similar rain patterns are visible in a rain image as well as its multi-scale (or multi-resolution) versions, which makes it possible to exploit such complementary information for rain streak representation. In this work, we explore the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi-scale progressive fusion network (MSPFN) for single image rain streak removal. For similar rain streaks at different positions, we employ recurrent calculation to capture the global texture, thus allowing to explore the complementary and redundant information at the spatial dimension to characterize target rain streaks. Besides, we construct multi-scale pyramid structure, and further introduce the attention mechanism to guide the fine fusion of this correlated information from different scales. This multi-scale progressive fusion strategy not only promotes the cooperative representation, but also boosts the end-to-end training. Our proposed method is extensively evaluated on several benchmark datasets and achieves state-of-the-art results. Moreover, we conduct experiments on joint deraining, detection, and segmentation tasks, and inspire a new research direction of vision task-driven image deraining. The source code is available at \url{https://github.com/kuihua/MSPFN}.
Tasks Rain Removal, Single Image Deraining
Published 2020-03-24
URL https://arxiv.org/abs/2003.10985v2
PDF https://arxiv.org/pdf/2003.10985v2.pdf
PWC https://paperswithcode.com/paper/multi-scale-progressive-fusion-network-for
Repo https://github.com/kuihua/MSPFN
Framework none

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives

Title Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives
Authors Duo Li, Qifeng Chen
Abstract While the depth of modern Convolutional Neural Networks (CNNs) surpasses that of the pioneering networks with a significant margin, the traditional way of appending supervision only over the final classifier and progressively propagating gradient flow upstream remains the training mainstay. Seminal Deeply-Supervised Networks (DSN) were proposed to alleviate the difficulty of optimization arising from gradient flow through a long chain. However, it is still vulnerable to issues including interference to the hierarchical representation generation process and inconsistent optimization objectives, as illustrated theoretically and empirically in this paper. Complementary to previous training strategies, we propose Dynamic Hierarchical Mimicking, a generic feature learning mechanism, to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Each branch can emerge from certain locations of the main branch dynamically, which not only retains representation rooted in the backbone network but also generates more diverse representations along its own pathway. We go one step further to promote multi-level interactions among different branches through an optimization formula with probabilistic prediction matching losses, thus guaranteeing a more robust optimization process and better representation ability. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method over its corresponding counterparts using diverse state-of-the-art CNN architectures. Code and models are publicly available at https://github.com/d-li14/DHM
Tasks
Published 2020-03-24
URL https://arxiv.org/abs/2003.10739v1
PDF https://arxiv.org/pdf/2003.10739v1.pdf
PWC https://paperswithcode.com/paper/dynamic-hierarchical-mimicking-towards
Repo https://github.com/d-li14/DHM
Framework pytorch
comments powered by Disqus