April 3, 2020

3329 words 16 mins read

Paper Group AWR 42

Contention Window Optimization in IEEE 802.11ax Networks with Deep Reinforcement Learning. DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction with Deep T1 Prior. Parsing as Pretraining. Segmented Graph-Bert for Graph Instance Modeling. Structured Prediction with Partial Labelling through the Infimum Loss. A Bayesian algo …

Contention Window Optimization in IEEE 802.11ax Networks with Deep Reinforcement Learning


Title	Contention Window Optimization in IEEE 802.11ax Networks with Deep Reinforcement Learning
Authors	Witold Wydmański, Szymon Szott
Abstract	The proper setting of contention window (CW) values has a significant impact on the efficiency of Wi-Fi networks. Unfortunately, the standard method used by 802.11 networks is not scalable enough to maintain stable throughput for an increasing number of stations, despite 802.11ax being designed to improve Wi-Fi performance in dense scenarios. To this end we propose a new method of CW control which leverages deep reinforcement learning principles to learn the correct settings under different network conditions. Our method supports two trainable control algorithms, which, as we demonstrate through simulations, offer efficiency close to optimal while keeping computational cost low.
Tasks
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01492v1
PDF	https://arxiv.org/pdf/2003.01492v1.pdf
PWC	https://paperswithcode.com/paper/contention-window-optimization-in-ieee
Repo	https://github.com/wwydmanski/RLinWiFi
Framework	none

DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction with Deep T1 Prior


Title	DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction with Deep T1 Prior
Authors	Bo Zhou, S. Kevin Zhou
Abstract	MRI with multiple protocols is commonly used for diagnosis, but it suffers from a long acquisition time, which yields the image quality vulnerable to say motion artifacts. To accelerate, various methods have been proposed to reconstruct full images from undersampled k-space data. However, these algorithms are inadequate for two main reasons. Firstly, aliasing artifacts generated in the image domain are structural and non-local, so that sole image domain restoration is insufficient. Secondly, though MRI comprises multiple protocols during one exam, almost all previous studies only employ the reconstruction of an individual protocol using a highly distorted undersampled image as input, leaving the use of fully-sampled short protocol (say T1) as complementary information highly underexplored. In this work, we address the above two limitations by proposing a Dual Domain Recurrent Network (DuDoRNet) with deep T1 prior embedded to simultaneously recover k-space and images for accelerating the acquisition of MRI with a long imaging protocol. Specifically, a Dilated Residual Dense Network (DRDNet) is customized for dual domain restorations from undersampled MRI data. Extensive experiments on different sampling patterns and acceleration rates demonstrate that our method consistently outperforms state-of-the-art methods, and can achieve SSIM up to 0.99 at $6 \times$ acceleration.
Tasks
Published	2020-01-11
URL	https://arxiv.org/abs/2001.03799v1
PDF	https://arxiv.org/pdf/2001.03799v1.pdf
PWC	https://paperswithcode.com/paper/dudornet-learning-a-dual-domain-recurrent
Repo	https://github.com/bbbbbbzhou/DuDoRNet
Framework	pytorch

Parsing as Pretraining


Title	Parsing as Pretraining
Authors	David Vilares, Michalina Strzyz, Anders Søgaard, Carlos Gómez-Rodríguez
Abstract	Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).
Tasks	Dependency Parsing, Language Modelling
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01685v1
PDF	https://arxiv.org/pdf/2002.01685v1.pdf
PWC	https://paperswithcode.com/paper/parsing-as-pretraining
Repo	https://github.com/aghie/parsing-as-pretraining
Framework	none

Segmented Graph-Bert for Graph Instance Modeling


Title	Segmented Graph-Bert for Graph Instance Modeling
Authors	Jiawei Zhang
Abstract	In graph instance representation learning, both the diverse graph instance sizes and the graph node orderless property have been the major obstacles that render existing representation learning models fail to work. In this paper, we will examine the effectiveness of GRAPH-BERT on graph instance representation learning, which was designed for node representation learning tasks originally. To adapt GRAPH-BERT to the new problem settings, we re-design it with a segmented architecture instead, which is also named as SEG-BERT (Segmented GRAPH-BERT) for reference simplicity in this paper. SEG-BERT involves no node-order-variant inputs or functional components anymore, and it can handle the graph node orderless property naturally. What’s more, SEG-BERT has a segmented architecture and introduces three different strategies to unify the graph instance sizes, i.e., full-input, padding/pruning and segment shifting, respectively. SEG-BERT is pre-trainable in an unsupervised manner, which can be further transferred to new tasks directly or with necessary fine-tuning. We have tested the effectiveness of SEG-BERT with experiments on seven graph instance benchmark datasets, and SEG-BERT can out-perform the comparison methods on six out of them with significant performance advantages.
Tasks	Graph Classification, Representation Learning
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03283v1
PDF	https://arxiv.org/pdf/2002.03283v1.pdf
PWC	https://paperswithcode.com/paper/segmented-graph-bert-for-graph-instance
Repo	https://github.com/jwzhanggy/graph_bert_work
Framework	none

Structured Prediction with Partial Labelling through the Infimum Loss


Title	Structured Prediction with Partial Labelling through the Infimum Loss
Authors	Vivien Cabannes, Alessandro Rudi, Francis Bach
Abstract	Annotating datasets is one of the main costs in nowadays supervised learning. The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect, as partial labelling. This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one. The problem of supervised learning with partial labelling has been studied for specific instances such as classification, multi-label, ranking or segmentation, but a general framework is still missing. This paper provides a unified framework based on structured prediction and on the concept of infimum loss to deal with partial labelling over a wide family of learning problems and loss functions. The framework leads naturally to explicit algorithms that can be easily implemented and for which proved statistical consistency and learning rates. Experiments confirm the superiority of the proposed approach over commonly used baselines.
Tasks	Structured Prediction
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00920v1
PDF	https://arxiv.org/pdf/2003.00920v1.pdf
PWC	https://paperswithcode.com/paper/structured-prediction-with-partial-labelling
Repo	https://github.com/VivienCabannes/infimum_loss
Framework	none

A Bayesian algorithm for retrosynthesis


Title	A Bayesian algorithm for retrosynthesis
Authors	Zhongliang Guo, Stephen Wu, Mitsuru Ohno, Ryo Yoshida
Abstract	The identification of synthetic routes that end with a desired product has been an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited fraction of the entire reaction space. At present, emerging machine-learning technologies are overturning the process of retrosynthetic planning. The objective of this study is to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of two steps: a deep neural network is trained that forwardly predicts a product of the given reactants with a high level of accuracy, following which this forward model is inverted into the backward one via Bayes’ law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. The Bayesian retrosynthesis algorithm could successfully rediscover 80.3% and 50.0% of known synthetic routes of single-step and two-step reactions within top-10 accuracy, respectively, thereby outperforming state-of-the-art algorithms in terms of the overall accuracy. Remarkably, the Monte Carlo method, which was specifically designed for the presence of diverse multiple routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We investigated the potential applicability of such diverse candidates based on expert knowledge from synthetic organic chemistry.
Tasks	Bayesian Inference, Combinatorial Optimization
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03190v1
PDF	https://arxiv.org/pdf/2003.03190v1.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-algorithm-for-retrosynthesis
Repo	https://github.com/zguo235/bayesian_retro
Framework	pytorch

Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records


Title	Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records
Authors	Jan Trienes, Dolf Trieschnigg, Christin Seifert, Djoerd Hiemstra
Abstract	Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.
Tasks
Published	2020-01-16
URL	https://arxiv.org/abs/2001.05714v1
PDF	https://arxiv.org/pdf/2001.05714v1.pdf
PWC	https://paperswithcode.com/paper/comparing-rule-based-feature-based-and-deep
Repo	https://github.com/nedap/deidentify
Framework	none

Implementing a GPU-based parallel MAX-MIN Ant System


Title	Implementing a GPU-based parallel MAX-MIN Ant System
Authors	Rafał Skinderowicz
Abstract	The MAX-MIN Ant System (MMAS) is one of the best-known Ant Colony Optimization (ACO) algorithms proven to be efficient at finding satisfactory solutions to many difficult combinatorial optimization problems. The slow-down in Moore’s law, and the availability of graphics processing units (GPUs) capable of conducting general-purpose computations at high speed, has sparked considerable research efforts into the development of GPU-based ACO implementations. In this paper, we discuss a range of novel ideas for improving the GPU-based parallel MMAS implementation, allowing it to better utilize the computing power offered by two subsequent Nvidia GPU architectures. Specifically, based on the weighted reservoir sampling algorithm we propose a novel parallel implementation of the node selection procedure, which is at the heart of the MMAS and other ACO algorithms. We also present a memory-efficient implementation of another key-component – the tabu list structure – which is used in the ACO’s solution construction stage. The proposed implementations, combined with the existing approaches, lead to a total of six MMAS variants, which are evaluated on a set of Traveling Salesman Problem (TSP) instances ranging from 198 to 3,795 cities. The results show that our MMAS implementation is competitive with state-of-the-art GPU-based and multi-core CPU-based parallel ACO implementations: in fact, the times obtained for the Nvidia V100 Volta GPU were up to 7.18x and 21.79x smaller, respectively. The fastest of the proposed MMAS variants is able to generate over 1 million candidate solutions per second when solving a 1,002-city instance. Moreover, we show that, combined with the 2-opt local search heuristic, the proposed parallel MMAS finds high-quality solutions for the TSP instances with up to 18,512 nodes.
Tasks	Combinatorial Optimization
Published	2020-01-18
URL	https://arxiv.org/abs/2003.11902v1
PDF	https://arxiv.org/pdf/2003.11902v1.pdf
PWC	https://paperswithcode.com/paper/implementing-a-gpu-based-parallel-max-min-ant
Repo	https://github.com/RSkinderowicz/GPU-based-MMAS
Framework	none

General Framework for Binary Classification on Top Samples


Title	General Framework for Binary Classification on Top Samples
Authors	Lukáš Adam, Václav Mácha, Václav Šmídl, Tomáš Pevný
Abstract	Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework to handle these classes of problems and show which known methods (both known and newly proposed) fall into this framework. We provide a theoretical analysis of this framework and mention selected possible pitfalls the methods may encounter. We suggest several numerical improvements including the implicit derivative and stochastic gradient descent. We provide an extensive numerical study. Based both on the theoretical properties and numerical experiments, we conclude the paper by suggesting which method should be used in which situation.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10923v1
PDF	https://arxiv.org/pdf/2002.10923v1.pdf
PWC	https://paperswithcode.com/paper/general-framework-for-binary-classification
Repo	https://github.com/VaclavMacha/ClassificationOnTop
Framework	none

Metric learning: cross-entropy vs. pairwise losses


Title	Metric learning: cross-entropy vs. pairwise losses
Authors	Malik Boudiaf, Jérôme Rony, Imtiaz Masud Ziko, Eric Granger, Marco Pedersoli, Pablo Piantanida, Ismail Ben Ayed
Abstract	Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses and convoluted sample-mining and implementation strategies to ease optimization. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. These findings indicate that the cross-entropy represents a proxy for maximizing the mutual information – as pairwise losses do – without the need for complex sample-mining and optimization schemes. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our experiments over four standard DML benchmarks (CUB200, Cars-196, Stanford Online Product and In-Shop) strongly support our findings. We consistently obtained state-of-the-art results, outperforming many recent and complex DML methods.
Tasks	Metric Learning
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08983v1
PDF	https://arxiv.org/pdf/2003.08983v1.pdf
PWC	https://paperswithcode.com/paper/metric-learning-cross-entropy-vs-pairwise
Repo	https://github.com/jeromerony/dml_cross_entropy
Framework	pytorch

Actions as Moving Points


Title	Actions as Moving Points
Authors	Yixuan Li, Zixu Wang, Limin Wang, Gangshan Wu
Abstract	The existing action tubelet detectors mainly depend on heuristic anchor box design and placement, which might be computationally expensive and sub-optimal for precise localization of action instances. In this paper, we present a new action tubelet detection framework, termed as MovingCenter Detector (MOC-detector), by treating an action instance as a trajectory of moving points. Based on the analysis that movement information could simplify and assist the action tubelet detection, our MOC-detector is decomposed into three crucial head branches: (1) Center Branch for instance center detection and action recognition, (2) Movement Branch for movement estimation at adjacent frames to form moving point trajectories, (3) Box Branch for spatial extent detection by directly regressing bounding box size at the estimated center point of each frame. These three branches work together to generate the tubelet detection results, that could be further linked to yield video level tubes with a common matching strategy. Our MOC-detector outperforms the existing state-of-the-art methods by a large margin under the same setting for frame-mAP and video-mAP on the JHMDB and UCF101-24 datasets. The performance gap is more evident for higher video IoU, demonstrating that our MOC-detector is particularly useful for more precise action detection.
Tasks	Action Detection
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04608v1
PDF	https://arxiv.org/pdf/2001.04608v1.pdf
PWC	https://paperswithcode.com/paper/actions-as-moving-points
Repo	https://github.com/mcg2019/MOC-Detector
Framework	pytorch

The Benefits of Pairwise Discriminators for Adversarial Training


Title	The Benefits of Pairwise Discriminators for Adversarial Training
Authors	Shangyuan Tong, Timur Garipov, Tommi Jaakkola
Abstract	Adversarial training methods typically align distributions by solving two-player games. However, in most current formulations, even if the generator aligns perfectly with data, a sub-optimal discriminator can still drive the two apart. Absent additional regularization, the instability can manifest itself as a never-ending game. In this paper, we introduce a family of objectives by leveraging pairwise discriminators, and show that only the generator needs to converge. The alignment, if achieved, would be preserved with any discriminator. We provide sufficient conditions for local convergence; characterize the capacity balance that should guide the discriminator and generator choices; and construct examples of minimally sufficient discriminators. Empirically, we illustrate the theory and the effectiveness of our approach on synthetic examples. Moreover, we show that practical methods derived from our approach can better generate higher-resolution images.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08621v1
PDF	https://arxiv.org/pdf/2002.08621v1.pdf
PWC	https://paperswithcode.com/paper/the-benefits-of-pairwise-discriminators-for
Repo	https://github.com/ShangyuanTong/PairGAN
Framework	pytorch

Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations


Title	Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations
Authors	Subhajit Chaudhury, Toshihiko Yamasaki
Abstract	In this paper, we study the generalization properties of neural networks under input perturbations and show that minimal training data corruption by a few pixel modifications can cause drastic overfitting. We propose an evolutionary algorithm to search for optimal pixel perturbations using novel cost function inspired from literature in domain adaptation that explicitly maximizes the generalization gap and domain divergence between clean and corrupted images. Our method outperforms previous pixel-based data distribution shift methods on state-of-the-art Convolutional Neural Networks (CNNs) architectures. Interestingly, we find that the choice of optimization plays an important role in generalization robustness due to the empirical observation that SGD is resilient to such training data corruption unlike adaptive optimization techniques (ADAM). Our source code is available at https://github.com/subhajitchaudhury/evo-shift.
Tasks	Domain Adaptation
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06646v1
PDF	https://arxiv.org/pdf/2003.06646v1.pdf
PWC	https://paperswithcode.com/paper/investigating-generalization-in-neural
Repo	https://github.com/subhajitchaudhury/evo-shift
Framework	none

SentenceMIM: A Latent Variable Language Model


Title	SentenceMIM: A Latent Variable Language Model
Authors	Micha Livne, Kevin Swersky, David J. Fleet
Abstract	We introduce sentenceMIM, a probabilistic auto-encoder for language modelling, trained with Mutual Information Machine (MIM) learning. Previous attempts to learn variational auto-encoders for language data have had mixed success, with empirical performance well below state-of-the-art auto-regressive models, a key barrier being the occurrence of posterior collapse with VAEs. The recently proposed MIM framework encourages high mutual information between observations and latent variables, and is more robust against posterior collapse. This paper formulates a MIM model for text data, along with a corresponding learning algorithm. We demonstrate excellent perplexity (PPL) results on several datasets, and show that the framework learns a rich latent space, allowing for interpolation between sentences of different lengths with a fixed-dimensional latent representation. We also demonstrate the versatility of sentenceMIM by utilizing a trained model for question-answering, a transfer learning task, without fine-tuning. To the best of our knowledge, this is the first latent variable model (LVM) for text modelling that achieves competitive performance with non-LVM models.
Tasks	Language Modelling, Question Answering, Transfer Learning
Published	2020-02-18
URL	https://arxiv.org/abs/2003.02645v2
PDF	https://arxiv.org/pdf/2003.02645v2.pdf
PWC	https://paperswithcode.com/paper/200302645
Repo	https://github.com/seraphlabs-ca/SentenceMIM-demo
Framework	pytorch

Reasoning on Knowledge Graphs with Debate Dynamics


Title	Reasoning on Knowledge Graphs with Debate Dynamics
Authors	Marcel Hildebrandt, Jorge Andres Quintero Serna, Yunpu Ma, Martin Ringsquandl, Mitchell Joblin, Volker Tresp
Abstract	We propose a novel method for automatic reasoning on knowledge graphs based on debate dynamics. The main idea is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments – paths in the knowledge graph – with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. Based on these arguments, a binary classifier, called the judge, decides whether the fact is true or false. The two agents can be considered as sparse, adversarial feature generators that present interpretable evidence for either the thesis or the antithesis. In contrast to other black-box methods, the arguments allow users to get an understanding of the decision of the judge. Since the focus of this work is to create an explainable method that maintains a competitive predictive accuracy, we benchmark our method on the triple classification and link prediction task. Thereby, we find that our method outperforms several baselines on the benchmark datasets FB15k-237, WN18RR, and Hetionet. We also conduct a survey and find that the extracted arguments are informative for users.
Tasks	Knowledge Graphs, Link Prediction
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00461v1
PDF	https://arxiv.org/pdf/2001.00461v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-on-knowledge-graphs-with-debate
Repo	https://github.com/m-hildebrandt/R2D2
Framework	tf