April 3, 2020

3501 words 17 mins read

Paper Group AWR 39

A Dataset of German Legal Documents for Named Entity Recognition. Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation. Listwise Learning to Rank by Exploring Unique Ratings. VerSe: A Vertebrae Labelling and Segmentation Benchmark. Probabilistic Regression for Visual Tracking. Instance Credibili …

A Dataset of German Legal Documents for Named Entity Recognition


Title	A Dataset of German Legal Documents for Named Entity Recognition
Authors	Elena Leitner, Georg Rehm, Julián Moreno-Schneider
Abstract	We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.
Tasks	Named Entity Recognition
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13016v1
PDF	https://arxiv.org/pdf/2003.13016v1.pdf
PWC	https://paperswithcode.com/paper/a-dataset-of-german-legal-documents-for-named
Repo	https://github.com/elenanereiss/Legal-Entity-Recognition
Framework	none

Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation


Title	Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation
Authors	Cheng Ma, Zhenyu Jiang, Yongming Rao, Jiwen Lu, Jie Zhou
Abstract	Recent works based on deep learning and facial priors have succeeded in super-resolving severely degraded facial images. However, the prior knowledge is not fully exploited in existing methods, since facial priors such as landmark and component maps are always estimated by low-resolution or coarsely super-resolved images, which may be inaccurate and thus affect the recovery performance. In this paper, we propose a deep face super-resolution (FSR) method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively. In each recurrent step, the recovery branch utilizes the prior knowledge of landmarks to yield higher-quality images which facilitate more accurate landmark estimation in turn. Therefore, the iterative information interaction between two processes boosts the performance of each other progressively. Moreover, a new attentive fusion module is designed to strengthen the guidance of landmark maps, where facial components are generated individually and aggregated attentively for better restoration. Quantitative and qualitative experimental results show the proposed method significantly outperforms state-of-the-art FSR methods in recovering high-quality face images.
Tasks	Super-Resolution
Published	2020-03-29
URL	https://arxiv.org/abs/2003.13063v1
PDF	https://arxiv.org/pdf/2003.13063v1.pdf
PWC	https://paperswithcode.com/paper/deep-face-super-resolution-with-iterative
Repo	https://github.com/Maclory/Deep-Iterative-Collaboration
Framework	pytorch

Listwise Learning to Rank by Exploring Unique Ratings


Title	Listwise Learning to Rank by Exploring Unique Ratings
Authors	Xiaofeng Zhu, Diego Klabjan
Abstract	In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can lead to imprecise permutation probabilities and inefficient training because of selecting documents one by one. (2) It does not favor documents having high relevance. (3) It has a loose assumption that sampling documents at different steps is independent. To overcome the first two limitations, we model ranking as selecting documents from a candidate set based on unique rating levels in decreasing order. The number of steps in training is determined by the number of unique rating levels. We propose a new loss function and associated four models for the entire sequence of weighted classification tasks by assigning high weights to the selected documents with high ratings for optimizing Normalized Discounted Cumulative Gain (NDCG). To overcome the final limitation, we further propose a novel and efficient way of refining prediction scores by combining an adapted Vanilla Recurrent Neural Network (RNN) model with pooling given selected documents at previous steps. We encode all of the documents already selected by an RNN model. In a single step, we rank all of the documents with the same ratings using the last cell of the RNN multiple times. We have implemented our models using three settings: neural networks, neural networks with gradient boosting, and regression trees with gradient boosting. We have conducted experiments on four public datasets. The experiments demonstrate that the models notably outperform state-of-the-art learning-to-rank models.
Tasks	Learning-To-Rank
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01828v3
PDF	https://arxiv.org/pdf/2001.01828v3.pdf
PWC	https://paperswithcode.com/paper/listwise-learning-to-rank-by-exploring-unique
Repo	https://github.com/XiaofengZhu/uRank_uMart
Framework	tf

VerSe: A Vertebrae Labelling and Segmentation Benchmark


Title	VerSe: A Vertebrae Labelling and Segmentation Benchmark
Authors	Anjany Sekuboyina, Amirhossein Bayat, Malek E. Husseini, Maximilian Löffler, Markus Rempfler, Jan Kukačka, Giles Tetteh, Alexander Valentinitsch, Christian Payer, Martin Urschler, Maodong Chen, Dalong Cheng, Nikolas Lessmann, Yujin Hu, Tianfu Wang, Dong Yang, Daguang Xu, Felix Ambellan, Stefan Zachowk, Tao Jiang, Xinjun Ma, Christoph Angerman, Xin Wang, Qingyue Wei, Kevin Brown, Matthias Wolf, Alexandre Kirszenberg, Élodie Puybareauq, Björn H. Menze, Jan S. Kirschke
Abstract	In this paper we report the challenge set-up and results of the Large Scale Vertebrae Segmentation Challenge (VerSe) organized in conjunction with the MICCAI 2019. The challenge consisted of two tasks, vertebrae labelling and vertebrae segmentation. For this a total of 160 multidetector CT scan cohort closely resembling clinical setting was prepared and was annotated at a voxel-level by a human-machine hybrid algorithm. In this paper we also present the annotation protocol and the algorithm that aided the medical experts in the annotation process. Eleven fully automated algorithms were benchmarked on this data with the best performing algorithm achieving a vertebrae identification rate of 95% and a Dice coefficient of 90%. VerSe’19 is an open-call challenge at its image data along with the annotations and evaluation tools will continue to be publicly accessible through its online portal.
Tasks
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09193v1
PDF	https://arxiv.org/pdf/2001.09193v1.pdf
PWC	https://paperswithcode.com/paper/verse-a-vertebrae-labelling-and-segmentation
Repo	https://github.com/christianpayer/MedicalDataAugmentationTool-VerSe
Framework	tf

Probabilistic Regression for Visual Tracking


Title	Probabilistic Regression for Visual Tracking
Authors	Martin Danelljan, Luc Van Gool, Radu Timofte
Abstract	Visual tracking is fundamentally the problem of regressing the state of the target in each video frame. While significant progress has been achieved, trackers are still prone to failures and inaccuracies. It is therefore crucial to represent the uncertainty in the target estimation. Although current prominent paradigms rely on estimating a state-dependent confidence score, this value lacks a clear probabilistic interpretation, complicating its use. In this work, we therefore propose a probabilistic regression formulation and apply it to tracking. Our network predicts the conditional probability density of the target state given an input image. Crucially, our formulation is capable of modeling label noise stemming from inaccurate annotations and ambiguities in the task. The regression network is trained by minimizing the Kullback-Leibler divergence. When applied for tracking, our formulation not only allows a probabilistic representation of the output, but also substantially improves the performance. Our tracker sets a new state-of-the-art on six datasets, achieving 59.8% AUC on LaSOT and 75.8% Success on TrackingNet. The code and models are available at https://github.com/visionml/pytracking.
Tasks	Visual Tracking
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12565v1
PDF	https://arxiv.org/pdf/2003.12565v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-regression-for-visual-tracking
Repo	https://github.com/visionml/pytracking
Framework	pytorch

Instance Credibility Inference for Few-Shot Learning


Title	Instance Credibility Inference for Few-Shot Learning
Authors	Yikai Wang, Chengming Xu, Chen Liu, Li Zhang, Yanwei Fu
Abstract	Few-shot learning (FSL) aims to recognize new objects with extremely limited training data for each category. Previous efforts are made by either leveraging meta-learning paradigm or novel principles in data augmentation to alleviate this extremely data-scarce problem. In contrast, this paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning. Specifically, we first train a linear classifier with the labeled few-shot examples and use it to infer the pseudo-labels for the unlabeled data. To measure the credibility of each pseudo-labeled instance, we then propose to solve another linear regression hypothesis by increasing the sparsity of the incidental parameters and rank the pseudo-labeled instances with their sparsity degree. We select the most trustworthy pseudo-labeled instances alongside the labeled examples to re-train the linear classifier. This process is iterated until all the unlabeled samples are included in the expanded training set, i.e. the pseudo-label is converged for unlabeled data pool. Extensive experiments under two few-shot settings show that our simple approach can establish new state-of-the-arts on four widely used few-shot learning benchmark datasets including miniImageNet, tieredImageNet, CIFAR-FS, and CUB. Our code is available at: https://github.com/Yikai-Wang/ICI-FSL
Tasks	Data Augmentation, Few-Shot Learning, Meta-Learning
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11853v1
PDF	https://arxiv.org/pdf/2003.11853v1.pdf
PWC	https://paperswithcode.com/paper/instance-credibility-inference-for-few-shot
Repo	https://github.com/Yikai-Wang/ICI-FSL
Framework	none

BachGAN: High-Resolution Image Synthesis from Salient Object Layout


Title	BachGAN: High-Resolution Image Synthesis from Salient Object Layout
Authors	Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu
Abstract	We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout. This new setting allows users to provide the layout of salient objects only (i.e., foreground bounding boxes and categories), and lets the model complete the drawing with an invented background and a matching foreground. Two main challenges spring from this new task: (i) how to generate fine-grained details and realistic textures without segmentation map input; and (ii) how to create a background and weave it seamlessly into standalone objects. To tackle this, we propose Background Hallucination Generative Adversarial Network (BachGAN), which first selects a set of segmentation maps from a large candidate pool via a background retrieval module, then encodes these candidate layouts via a background fusion module to hallucinate a suitable background for the given objects. By generating the hallucinated background representation dynamically, our model can synthesize high-resolution images with both photo-realistic foreground and integral background. Experiments on Cityscapes and ADE20K datasets demonstrate the advantage of BachGAN over existing methods, measured on both visual fidelity of generated images and visual alignment between output images and input layouts.
Tasks	Image Generation
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11690v2
PDF	https://arxiv.org/pdf/2003.11690v2.pdf
PWC	https://paperswithcode.com/paper/bachgan-high-resolution-image-synthesis-from
Repo	https://github.com/Cold-Winter/BachGAN
Framework	pytorch

An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset


Title	An LSTM-Based Autonomous Driving Model Using Waymo Open Dataset
Authors	Zhicheng Gu, Zhihao Li, Xuan Di, Rongye Shi
Abstract	The Waymo Open Dataset has been released recently, providing a platform to crowdsource some fundamental challenges for automated vehicles (AVs), such as 3D detection and tracking. While~the dataset provides a large amount of high-quality and multi-source driving information, people in academia are more interested in the underlying driving policy programmed in Waymo self-driving cars, which is inaccessible due to AV manufacturers’ proprietary protection. Accordingly, academic researchers have to make various assumptions to implement AV components in their models or simulations, which may not represent the realistic interactions in real-world traffic. Thus, this paper introduces an approach to learn a long short-term memory (LSTM)-based model for imitating the behavior of Waymo’s self-driving model. The proposed model has been evaluated based on Mean Absolute Error (MAE). The experimental results show that our model outperforms several baseline models in driving action prediction. In addition, a visualization tool is presented for verifying the performance of the model.
Tasks	Autonomous Driving, Self-Driving Cars
Published	2020-02-14
URL	https://arxiv.org/abs/2002.05878v2
PDF	https://arxiv.org/pdf/2002.05878v2.pdf
PWC	https://paperswithcode.com/paper/an-lstm-based-autonomous-driving-model-using
Repo	https://github.com/JdeRobot/BehaviorSuite
Framework	tf

End-to-end Autonomous Driving Perception with Sequential Latent Representation Learning


Title	End-to-end Autonomous Driving Perception with Sequential Latent Representation Learning
Authors	Jianyu Chen, Zhuo Xu, Masayoshi Tomizuka
Abstract	Current autonomous driving systems are composed of a perception system and a decision system. Both of them are divided into multiple subsystems built up with lots of human heuristics. An end-to-end approach might clean up the system and avoid huge efforts of human engineering, as well as obtain better performance with increasing data and computation resources. Compared to the decision system, the perception system is more suitable to be designed in an end-to-end framework, since it does not require online driving exploration. In this paper, we propose a novel end-to-end approach for autonomous driving perception. A latent space is introduced to capture all relevant features useful for perception, which is learned through sequential latent representation learning. The learned end-to-end perception model is able to solve the detection, tracking, localization and mapping problems altogether with only minimum human engineering efforts and without storing any maps online. The proposed method is evaluated in a realistic urban driving simulator, with both camera image and lidar point cloud as sensor inputs. The codes and videos of this work are available at our github repo and project website.
Tasks	Autonomous Driving, Representation Learning
Published	2020-03-21
URL	https://arxiv.org/abs/2003.12464v1
PDF	https://arxiv.org/pdf/2003.12464v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-autonomous-driving-perception-with
Repo	https://github.com/cjy1992/detect-loc-map
Framework	tf

Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net)


Title	Three-branch and Mutil-scale learning for Fine-grained Image Recognition (TBMSL-Net)
Authors	Fan Zhang, Guisheng Zhai, Meng Li, Yizhao Liu
Abstract	ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years, but it can not achieve good result to directly migrate the champions of the annual competition, to fine-grained visual categorization (FGVC) tasks. The small interclass variations and the large intraclass variations caused by the fine-grained nature makes it a challenging problem. The proposed method can be effectively localize object and useful part regions without the need of bounding-box and part annotations by attention object location module (AOLM) and attention part proposal module (APPM). The obtained object images contain both the whole structure and more details, the part images have many different scales and have more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our three-branch network structure. The model has good classification ability, good generalization and robustness for different scale object images. Our approach is end-to-end training, through the comprehensive experiments demonstrate that our approach achieves state-of-the-art results on CUB-200-2011, Stanford Cars and FGVC-Aircraft datasets.
Tasks	Fine-Grained Image Classification, Fine-Grained Image Recognition, Fine-Grained Visual Categorization, Object Recognition
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09150v2
PDF	https://arxiv.org/pdf/2003.09150v2.pdf
PWC	https://paperswithcode.com/paper/three-branch-and-mutil-scale-learning-for
Repo	https://github.com/ZF1044404254/TBMSL-Net
Framework	pytorch

Incorporating User’s Preference into Attributed Graph Clustering


Title	Incorporating User’s Preference into Attributed Graph Clustering
Authors	Wei Ye, Dominik Mautz, Christian Boehm, Ambuj Singh, Claudia Plant
Abstract	Graph clustering has been studied extensively on both plain graphs and attributed graphs. However, all these methods need to partition the whole graph to find cluster structures. Sometimes, based on domain knowledge, people may have information about a specific target region in the graph and only want to find a single cluster concentrated on this local region. Such a task is called local clustering. In contrast to global clustering, local clustering aims to find only one cluster that is concentrating on the given seed vertex (and also on the designated attributes for attributed graphs). Currently, very few methods can deal with this kind of task. To this end, we propose two quality measures for a local cluster: Graph Unimodality (GU) and Attribute Unimodality (AU). The former measures the homogeneity of the graph structure while the latter measures the homogeneity of the subspace that is composed of the designated attributes. We call their linear combination as Compactness. Further, we propose LOCLU to optimize the Compactness score. The local cluster detected by LOCLU concentrates on the region of interest, provides efficient information flow in the graph and exhibits a unimodal data distribution in the subspace of the designated attributes.
Tasks	Graph Clustering
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11079v1
PDF	https://arxiv.org/pdf/2003.11079v1.pdf
PWC	https://paperswithcode.com/paper/incorporating-user-s-preference-into
Repo	https://github.com/yeweiysh/LOCLU
Framework	none

Optimistic Exploration even with a Pessimistic Initialisation


Title	Optimistic Exploration even with a Pessimistic Initialisation
Authors	Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson
Abstract	Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use optimistic initialisation despite taking inspiration from these provably efficient tabular algorithms. In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values due to commonly used network initialisation schemes, a pessimistic initialisation. Merely initialising the network to output optimistic Q-values is not enough, since we cannot ensure that they remain optimistic for novel state-action pairs, which is crucial for exploration. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network. We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting. Our algorithm, Optimistic Pessimistically Initialised Q-Learning (OPIQ), augments the Q-value estimates of a DQN-based agent with count-derived bonuses to ensure optimism during both action selection and bootstrapping. We show that OPIQ outperforms non-optimistic DQN variants that utilise a pseudocount-based intrinsic motivation in hard exploration tasks, and that it predicts optimistic estimates for novel state-action pairs.
Tasks	Efficient Exploration, Q-Learning
Published	2020-02-26
URL	https://arxiv.org/abs/2002.12174v1
PDF	https://arxiv.org/pdf/2002.12174v1.pdf
PWC	https://paperswithcode.com/paper/optimistic-exploration-even-with-a-1
Repo	https://github.com/oxwhirl/opiq
Framework	pytorch

PADS: Policy-Adapted Sampling for Visual Similarity Learning


Title	PADS: Policy-Adapted Sampling for Visual Similarity Learning
Authors	Karsten Roth, Timo Milbich, Björn Ommer
Abstract	Learning visual similarity requires to learn relations, typically between triplets of images. Albeit triplet approaches being powerful, their computational complexity mostly limits training to only a subset of all possible training triplets. Thus, sampling strategies that decide when to use which training sample during learning are crucial. Currently, the prominent paradigm are fixed or curriculum sampling strategies that are predefined before training starts. However, the problem truly calls for a sampling process that adjusts based on the actual state of the similarity representation during training. We, therefore, employ reinforcement learning and have a teacher network adjust the sampling distribution based on the current state of the learner network, which represents visual similarity. Experiments on benchmark datasets using standard triplet-based losses show that our adaptive sampling strategy significantly outperforms fixed sampling strategies. Moreover, although our adaptive sampling is only applied on top of basic triplet-learning frameworks, we reach competitive results to state-of-the-art approaches that employ diverse additional learning signals or strong ensemble architectures. Code can be found under https://github.com/Confusezius/CVPR2020_PADS.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11113v2
PDF	https://arxiv.org/pdf/2003.11113v2.pdf
PWC	https://paperswithcode.com/paper/pads-policy-adapted-sampling-for-visual
Repo	https://github.com/Confusezius/CVPR2020_PADS
Framework	pytorch

Multi-Scale Progressive Fusion Network for Single Image Deraining


Title	Multi-Scale Progressive Fusion Network for Single Image Deraining
Authors	Kui Jiang, Zhongyuan Wang, Peng Yi, Chen Chen, Baojin Huang, Yimin Luo, Jiayi Ma, Junjun Jiang
Abstract	Rain streaks in the air appear in various blurring degrees and resolutions due to different distances from their positions to the camera. Similar rain patterns are visible in a rain image as well as its multi-scale (or multi-resolution) versions, which makes it possible to exploit such complementary information for rain streak representation. In this work, we explore the multi-scale collaborative representation for rain streaks from the perspective of input image scales and hierarchical deep features in a unified framework, termed multi-scale progressive fusion network (MSPFN) for single image rain streak removal. For similar rain streaks at different positions, we employ recurrent calculation to capture the global texture, thus allowing to explore the complementary and redundant information at the spatial dimension to characterize target rain streaks. Besides, we construct multi-scale pyramid structure, and further introduce the attention mechanism to guide the fine fusion of this correlated information from different scales. This multi-scale progressive fusion strategy not only promotes the cooperative representation, but also boosts the end-to-end training. Our proposed method is extensively evaluated on several benchmark datasets and achieves state-of-the-art results. Moreover, we conduct experiments on joint deraining, detection, and segmentation tasks, and inspire a new research direction of vision task-driven image deraining. The source code is available at \url{https://github.com/kuihua/MSPFN}.
Tasks	Rain Removal, Single Image Deraining
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10985v2
PDF	https://arxiv.org/pdf/2003.10985v2.pdf
PWC	https://paperswithcode.com/paper/multi-scale-progressive-fusion-network-for
Repo	https://github.com/kuihua/MSPFN
Framework	none

Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives


Title	Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives
Authors	Duo Li, Qifeng Chen
Abstract	While the depth of modern Convolutional Neural Networks (CNNs) surpasses that of the pioneering networks with a significant margin, the traditional way of appending supervision only over the final classifier and progressively propagating gradient flow upstream remains the training mainstay. Seminal Deeply-Supervised Networks (DSN) were proposed to alleviate the difficulty of optimization arising from gradient flow through a long chain. However, it is still vulnerable to issues including interference to the hierarchical representation generation process and inconsistent optimization objectives, as illustrated theoretically and empirically in this paper. Complementary to previous training strategies, we propose Dynamic Hierarchical Mimicking, a generic feature learning mechanism, to advance CNN training with enhanced generalization ability. Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network. Each branch can emerge from certain locations of the main branch dynamically, which not only retains representation rooted in the backbone network but also generates more diverse representations along its own pathway. We go one step further to promote multi-level interactions among different branches through an optimization formula with probabilistic prediction matching losses, thus guaranteeing a more robust optimization process and better representation ability. Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method over its corresponding counterparts using diverse state-of-the-art CNN architectures. Code and models are publicly available at https://github.com/d-li14/DHM
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10739v1
PDF	https://arxiv.org/pdf/2003.10739v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-hierarchical-mimicking-towards
Repo	https://github.com/d-li14/DHM
Framework	pytorch