April 3, 2020

# Paper Group ANR 16

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness. Dense Dilated Convolutions Merging Network for Land Cover Classification. On Expansion and Contraction of DL-Lite Knowledge Bases. Exploratory Machine Learning with Unknown Unknowns. Dynamic Divide-a …

#### ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

Title ProGraML: Graph-based Deep Learning for Program Optimization and Analysis
Authors Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather
Abstract The increasing complexity of computing systems places a tremendous burden on optimizing compilers, requiring ever more accurate and aggressive optimizations. Machine learning offers significant benefits for constructing optimization heuristics but there remains a gap between what state-of-the-art methods achieve and the performance of an optimal heuristic. Closing this gap requires improvements in two key areas: a representation that accurately captures the semantics of programs, and a model architecture with sufficient expressiveness to reason about this representation. We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs. The ProGraML representation is a directed attributed multigraph that captures control, data, and call relations, and summarizes instruction and operand types and ordering. Message Passing Neural Networks propagate information through this structured representation, enabling whole-program or per-vertex classification tasks. ProGraML provides a general-purpose program representation that equips learnable models to perform the types of program analysis that are fundamental to optimization. To this end, we evaluate the performance of our approach first on a suite of traditional compiler analysis tasks: control flow reachability, dominator trees, data dependencies, variable liveness, and common subexpression detection. On a benchmark dataset of 250k LLVM-IR files covering six source programming languages, ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches. We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
Published 2020-03-23
URL https://arxiv.org/abs/2003.10536v1
PDF https://arxiv.org/pdf/2003.10536v1.pdf
PWC https://paperswithcode.com/paper/programl-graph-based-deep-learning-for
Repo
Framework

#### Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness

Title Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness
Authors Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi
Abstract Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against $\ell_2$-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest $\ell_p$-radius that can be certified decreases as $O(1/d^{\frac{1}{2} - \frac{1}{p}})$ with dimension $d$ for $p > 2$. Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when $p \geq 2$. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an $\ell_1$ or an $\ell_\infty$-norm ball, we show upper bounds of the form $O(1 / d)$ and $O(1 / d^{1 - \frac{1}{p}})$ respectively, which have an even worse dependence on $d$.
Published 2020-02-08
URL https://arxiv.org/abs/2002.03239v1
PDF https://arxiv.org/pdf/2002.03239v1.pdf
PWC https://paperswithcode.com/paper/curse-of-dimensionality-on-randomized
Repo
Framework

#### Dense Dilated Convolutions Merging Network for Land Cover Classification

Title Dense Dilated Convolutions Merging Network for Land Cover Classification
Authors Qinghui Liu, Michael Kampffmeyer, Robert Jessen, Arnt-Børre Salberg
Abstract Land cover classification of remote sensing images is a challenging task due to limited amounts of annotated data, highly imbalanced classes, frequent incorrect pixel-level annotations, and an inherent complexity in the semantic segmentation task. In this article, we propose a novel architecture called the dense dilated convolutions’ merging network (DDCM-Net) to address this task. The proposed DDCM-Net consists of dense dilated image convolutions merged with varying dilation rates. This effectively utilizes rich combinations of dilated convolutions that enlarge the network’s receptive fields with fewer parameters and features compared with the state-of-the-art approaches in the remote sensing domain. Importantly, DDCM-Net obtains fused local- and global-context information, in effect incorporating surrounding discriminative capability for multiscale and complex-shaped objects with similar color and textures in very high-resolution aerial imagery. We demonstrate the effectiveness, robustness, and flexibility of the proposed DDCM-Net on the publicly available ISPRS Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set. Our single model, trained on three-band Potsdam and Vaihingen data sets, achieves better accuracy in terms of both mean intersection over union (mIoU) and F1-score compared with other published models trained with more than three-band data. We further validate our model on the DeepGlobe data set, achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at a lower computational cost compared with related recent work. Code available at https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorch
Published 2020-03-09
URL https://arxiv.org/abs/2003.04027v1
PDF https://arxiv.org/pdf/2003.04027v1.pdf
PWC https://paperswithcode.com/paper/dense-dilated-convolutions-merging-network-1
Repo
Framework

#### On Expansion and Contraction of DL-Lite Knowledge Bases

Title On Expansion and Contraction of DL-Lite Knowledge Bases
Authors Dmitriy Zheleznyakov, Evgeny Kharlamov, Werner Nutt, Diego Calvanese
Abstract Knowledge bases (KBs) are not static entities: new information constantly appears and some of the previous knowledge becomes obsolete. In order to reflect this evolution of knowledge, KBs should be expanded with the new knowledge and contracted from the obsolete one. This problem is well-studied for propositional but much less for first-order KBs. In this work we investigate knowledge expansion and contraction for KBs expressed in DL-Lite, a family of description logics (DLs) that underlie the tractable fragment OWL 2 QL of the Web Ontology Language OWL 2. We start with a novel knowledge evolution framework and natural postulates that evolution should respect, and compare our postulates to the well-established AGM postulates. We then review well-known model and formula-based approaches for expansion and contraction for propositional theories and show how they can be adapted to the case of DL-Lite. In particular, we show intrinsic limitations of model-based approaches: besides the fact that some of them do not respect the postulates we have established, they ignore the structural properties of KBs. This leads to undesired properties of evolution results: evolution of DL-Lite KBs cannot be captured in DL-Lite. Moreover, we show that well-known formula-based approaches are also not appropriate for DL-Lite expansion and contraction: they either have a high complexity of computation, or they produce logical theories that cannot be expressed in DL-Lite. Thus, we propose a novel formula-based approach that respects our principles and for which evolution is expressible in DL-Lite. For this approach we also propose polynomial time deterministic algorithms to compute evolution of DL-Lite KBs when evolution affects only factual data.
Published 2020-01-25
URL https://arxiv.org/abs/2001.09365v1
PDF https://arxiv.org/pdf/2001.09365v1.pdf
PWC https://paperswithcode.com/paper/on-expansion-and-contraction-of-dl-lite
Repo
Framework

#### Exploratory Machine Learning with Unknown Unknowns

Title Exploratory Machine Learning with Unknown Unknowns
Authors Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou
Abstract In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to the known labels. In this paper, we study a new problem setting in which there are unknown classes in the training dataset misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates the training dataset by actively augmenting the feature space to discover potentially unknown labels. Our approach consists of three ingredients including rejection model, feature acquisition, and model cascade. The effectiveness is validated on both synthetic and real datasets.
Published 2020-02-05
URL https://arxiv.org/abs/2002.01605v1
PDF https://arxiv.org/pdf/2002.01605v1.pdf
PWC https://paperswithcode.com/paper/exploratory-machine-learning-with-unknown
Repo
Framework

#### Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation

Title Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
Authors Xiaogang Xu, Hengshuang Zhao, Jiaya Jia
Abstract Adversarial training is promising for improving robustness of deep neural networks towards adversarial perturbation, especially on classification tasks. Effect of this type of training on semantic segmentation, contrarily, just commences. We make the initial attempt to explore the defense strategy on semantic segmentation by formulating a general adversarial training procedure that can perform decently on both adversarial and clean samples. We propose a dynamic divide-and-conquer adversarial training (DDC-AT) strategy to enhance the defense effect, by setting additional branches in the target model during training, and dealing with pixels with diverse properties towards adversarial perturbation. Our dynamical division mechanism divides pixels into multiple branches automatically, achieved by unsupervised learning. Note all these additional branches can be abandoned during inference and thus leave no extra parameter and computation cost. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets, in which DDC-AT yields satisfying performance under both white- and black-box attack.
Published 2020-03-14
URL https://arxiv.org/abs/2003.06555v1
PDF https://arxiv.org/pdf/2003.06555v1.pdf
Repo
Framework

#### GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Title GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
Authors Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang
Abstract Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.
Tasks Density Estimation, Drug Discovery, Graph Generation
Published 2020-01-26
URL https://arxiv.org/abs/2001.09382v2
PDF https://arxiv.org/pdf/2001.09382v2.pdf
PWC https://paperswithcode.com/paper/graphaf-a-flow-based-autoregressive-model-for-1
Repo
Framework

#### Analyzing Word Translation of Transformer Layers

Title Analyzing Word Translation of Transformer Layers
Authors Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu
Abstract The Transformer translation model is popular for its effective parallelization and performance. Though a wide range of analysis about the Transformer has been conducted recently, the role of each Transformer layer in translation has not been studied to our knowledge. In this paper, we propose approaches to analyze the translation performed in encoder / decoder layers of the Transformer. Our approaches in general project the representations of an analyzed layer to the pre-trained classifier and measure the word translation accuracy. For the analysis of encoder layers, our approach additionally learns a weight vector to merge multiple attention matrices into one and transform the source encoding to the target side with the merged alignment matrix to align source tokens with target translations while bridging different input - output lengths. While analyzing decoder layers, we additionally study the effects of the source context and the decoding history in word prediction through bypassing the corresponding self-attention or cross-attention sub-layers. Our analysis reveals that the translation starts at the very beginning of the “encoding” (specifically at the source word embedding layer), and shows how translation evolves during the forward computation of layers. Based on observations gained in our analysis, we propose that increasing encoder depth while removing the same number of decoder layers can simply but significantly boost the decoding speed. Furthermore, simply inserting a linear projection layer before the decoder classifier which shares the weight matrix with the embedding layer can effectively provide small but consistent and significant improvements in our experiments on the WMT 14 English-German, English-French and WMT 15 Czech-English translation tasks (+0.42, +0.37 and +0.47 respectively).
Published 2020-03-21
URL https://arxiv.org/abs/2003.09586v1
PDF https://arxiv.org/pdf/2003.09586v1.pdf
PWC https://paperswithcode.com/paper/analyzing-word-translation-of-transformer
Repo
Framework

#### CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods

Title CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods
Authors Wei Zhang, Thomas Kobber Panum, Somesh Jha, Prasad Chalasani, David Page
Abstract We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger causality across a wide variety of event sequences with diverse event interdependency. To address these weaknesses, we propose CAUSE (Causality from AttribUtions on Sequence of Events), a novel framework for the studied task. The key idea of CAUSE is to first implicitly capture the underlying event interdependency by fitting a neural point process, and then extract from the process a Granger causality statistic using an axiomatic attribution method. Across multiple datasets riddled with diverse event interdependency, we demonstrate that CAUSE achieves superior performance on correctly inferring the inter-type Granger causality over a range of state-of-the-art methods.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07906v1
PDF https://arxiv.org/pdf/2002.07906v1.pdf
PWC https://paperswithcode.com/paper/cause-learning-granger-causality-from-event
Repo
Framework

#### Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Title Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
Authors Vrettos Moulos
Abstract We study an extension of the classic stochastic multi-armed bandit problem which involves Markovian rewards and multiple plays. In order to tackle this problem we consider an index based adaptive allocation rule which at each stage combines calculations of sample means, and of upper confidence bounds, using the Kullback-Leibler divergence rate, for the stationary expected reward of Markovian arms. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish, asymptotically optimal, finite-time guarantees for the case of multiple plays, and IID rewards drawn from a one-parameter exponential family of probability densities.
Published 2020-01-30
URL https://arxiv.org/abs/2001.11201v1
PDF https://arxiv.org/pdf/2001.11201v1.pdf
PWC https://paperswithcode.com/paper/finite-time-analysis-of-kullback-leibler
Repo
Framework

#### Multivariate Functional Regression via Nested Reduced-Rank Regularization

Title Multivariate Functional Regression via Nested Reduced-Rank Regularization
Authors Xiaokang Liu, Shujie Ma, Kun Chen
Abstract We propose a nested reduced-rank regression (NRRR) approach in fitting regression model with multivariate functional responses and predictors, to achieve tailored dimension reduction and facilitate interpretation/visualization of the resulting functional model. Our approach is based on a two-level low-rank structure imposed on the functional regression surfaces. A global low-rank structure identifies a small set of latent principal functional responses and predictors that drives the underlying regression association. A local low-rank structure then controls the complexity and smoothness of the association between the principal functional responses and predictors. Through a basis expansion approach, the functional problem boils down to an interesting integrated matrix approximation task, where the blocks or submatrices of an integrated low-rank matrix share some common row space and/or column space. An iterative algorithm with convergence guarantee is developed. We establish the consistency of NRRR and also show through non-asymptotic analysis that it can achieve at least a comparable error rate to that of the reduced-rank regression. Simulation studies demonstrate the effectiveness of NRRR. We apply NRRR in an electricity demand problem, to relate the trajectories of the daily electricity consumption with those of the daily temperatures.
Published 2020-03-10
URL https://arxiv.org/abs/2003.04786v1
PDF https://arxiv.org/pdf/2003.04786v1.pdf
PWC https://paperswithcode.com/paper/multivariate-functional-regression-via-nested
Repo
Framework

#### Convolutional Neural Networks for Sentiment Analysis in Persian Social Media

Title Convolutional Neural Networks for Sentiment Analysis in Persian Social Media
Authors Morteza Rohanian, Mostafa Salehi, Ali Darzi, Vahid Ranjbar
Abstract With the social media engagement on the rise, the resulting data can be used as a rich resource for analyzing and understanding different phenomena around us. A sentiment analysis system employs these data to find the attitude of social media users towards certain entities in a given document. In this paper we propose a sentiment analysis method for Persian text using Convolutional Neural Network (CNN), a feedforward Artificial Neural Network, that categorize sentences into two and five classes (considering their intensity) by applying a layer of convolution over input data through different filters. We evaluated the method on three different datasets of Persian social media texts using Area under Curve metric. The final results show the advantage of using CNN over earlier attempts at developing traditional machine learning methods for Persian texts sentiment classification especially for short texts.
Published 2020-02-14
URL https://arxiv.org/abs/2002.06233v1
PDF https://arxiv.org/pdf/2002.06233v1.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-networks-for-sentiment-1
Repo
Framework

#### Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Title Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Authors Nathan Kallus, Angela Zhou
Abstract Off-policy evaluation of sequential decision policies from observational data is necessary in applications of batch reinforcement learning such as education and healthcare. In such settings, however, observed actions are often confounded with transitions by unobserved variables, rendering exact evaluation of new policies impossible, i.e., unidentifiable. We develop a robust approach that estimates sharp bounds on the (unidentifiable) value of a given policy in an infinite-horizon problem given data from another policy with unobserved confounding subject to a sensitivity model. We phrase the problem precisely as computing the support function of the set of all stationary state-occupancy ratios that agree with both the data and the sensitivity model. We show how to express this set using a new partially identified estimating equation and prove convergence to the sharp bounds, as we collect more confounded data. We prove that membership in the set can be checked by solving a linear program, while the support function is given by a difficult nonconvex optimization problem. We leverage an analytical solution for the finite-state-space case to develop approximations based on nonconvex projected gradient descent. We demonstrate the resulting bounds empirically.
Published 2020-02-11
URL https://arxiv.org/abs/2002.04518v1
PDF https://arxiv.org/pdf/2002.04518v1.pdf
PWC https://paperswithcode.com/paper/confounding-robust-policy-evaluation-in
Repo
Framework

#### Improving the Performance of Stochastic Local Search for Maximum Vertex Weight Clique Problem Using Programming by Optimization

Title Improving the Performance of Stochastic Local Search for Maximum Vertex Weight Clique Problem Using Programming by Optimization
Authors Yi Chu, Chuan Luo, Holger H. Hoos, QIngwei Lin, Haihang You
Abstract The maximum vertex weight clique problem (MVWCP) is an important generalization of the maximum clique problem (MCP) that has a wide range of real-world applications. In situations where rigorous guarantees regarding the optimality of solutions are not required, MVWCP is usually solved using stochastic local search (SLS) algorithms, which also define the state of the art for solving this problem. However, there is no single SLS algorithm which gives the best performance across all classes of MVWCP instances, and it is challenging to effectively identify the most suitable algorithm for each class of MVWCP instances. In this work, we follow the paradigm of Programming by Optimization (PbO) to develop a new, flexible and highly parametric SLS framework for solving MVWCP, combining, for the first time, a broad range of effective heuristic mechanisms. By automatically configuring this PbO-MWC framework, we achieve substantial advances in the state-of-the-art in solving MVWCP over a broad range of prominent benchmarks, including two derived from real-world applications in transplantation medicine (kidney exchange) and assessment of research excellence.
Published 2020-02-27
URL https://arxiv.org/abs/2002.11909v1
PDF https://arxiv.org/pdf/2002.11909v1.pdf
PWC https://paperswithcode.com/paper/improving-the-performance-of-stochastic-local
Repo
Framework

#### ABCTracker: an easy-to-use, cloud-based application for tracking multiple objects

Title ABCTracker: an easy-to-use, cloud-based application for tracking multiple objects
Authors Lance Rice, Samual Tate, David Farynyk, Joshua Sun, Greg Chism, Daniel Charbonneau, Thomas Fasciano, Anna Dornhaus, Min C. Shin
Abstract Visual multi-object tracking has the potential to accelerate many forms of quantitative analyses, especially in research communities investigating the motion, behavior, or social interactions within groups of animals. Despite its potential for increasing analysis throughput, complications related to accessibility, adaptability, accuracy, or scalable application arise with existing tracking systems. Several iterations of prototyping and testing have led us to a multi-object tracking system – ABCTracker – that is: accessible in both system as well as technical knowledge requirements, easily adaptable to new videos, and capable of producing accurate tracking data through a mixture of automatic and semi-automatic tracking features.
Published 2020-01-27
URL https://arxiv.org/abs/2001.10072v2
PDF https://arxiv.org/pdf/2001.10072v2.pdf
PWC https://paperswithcode.com/paper/abctracker-an-easy-to-use-cloud-based
Repo
Framework