Paper Group ANR 16
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness. Dense Dilated Convolutions Merging Network for Land Cover Classification. On Expansion and Contraction of DL-Lite Knowledge Bases. Exploratory Machine Learning with Unknown Unknowns. Dynamic Divide-a …
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis
Title | ProGraML: Graph-based Deep Learning for Program Optimization and Analysis |
Authors | Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather |
Abstract | The increasing complexity of computing systems places a tremendous burden on optimizing compilers, requiring ever more accurate and aggressive optimizations. Machine learning offers significant benefits for constructing optimization heuristics but there remains a gap between what state-of-the-art methods achieve and the performance of an optimal heuristic. Closing this gap requires improvements in two key areas: a representation that accurately captures the semantics of programs, and a model architecture with sufficient expressiveness to reason about this representation. We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs. The ProGraML representation is a directed attributed multigraph that captures control, data, and call relations, and summarizes instruction and operand types and ordering. Message Passing Neural Networks propagate information through this structured representation, enabling whole-program or per-vertex classification tasks. ProGraML provides a general-purpose program representation that equips learnable models to perform the types of program analysis that are fundamental to optimization. To this end, we evaluate the performance of our approach first on a suite of traditional compiler analysis tasks: control flow reachability, dominator trees, data dependencies, variable liveness, and common subexpression detection. On a benchmark dataset of 250k LLVM-IR files covering six source programming languages, ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches. We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both. |
Tasks | |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10536v1 |
https://arxiv.org/pdf/2003.10536v1.pdf | |
PWC | https://paperswithcode.com/paper/programl-graph-based-deep-learning-for |
Repo | |
Framework | |
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness
Title | Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness |
Authors | Aounon Kumar, Alexander Levine, Tom Goldstein, Soheil Feizi |
Abstract | Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against $\ell_2$-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i.i.d. smoothing distributions, we prove that the largest $\ell_p$-radius that can be certified decreases as $O(1/d^{\frac{1}{2} - \frac{1}{p}})$ with dimension $d$ for $p > 2$. Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to generalized Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when $p \geq 2$. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an $\ell_1$ or an $\ell_\infty$-norm ball, we show upper bounds of the form $O(1 / d)$ and $O(1 / d^{1 - \frac{1}{p}})$ respectively, which have an even worse dependence on $d$. |
Tasks | |
Published | 2020-02-08 |
URL | https://arxiv.org/abs/2002.03239v1 |
https://arxiv.org/pdf/2002.03239v1.pdf | |
PWC | https://paperswithcode.com/paper/curse-of-dimensionality-on-randomized |
Repo | |
Framework | |
Dense Dilated Convolutions Merging Network for Land Cover Classification
Title | Dense Dilated Convolutions Merging Network for Land Cover Classification |
Authors | Qinghui Liu, Michael Kampffmeyer, Robert Jessen, Arnt-Børre Salberg |
Abstract | Land cover classification of remote sensing images is a challenging task due to limited amounts of annotated data, highly imbalanced classes, frequent incorrect pixel-level annotations, and an inherent complexity in the semantic segmentation task. In this article, we propose a novel architecture called the dense dilated convolutions’ merging network (DDCM-Net) to address this task. The proposed DDCM-Net consists of dense dilated image convolutions merged with varying dilation rates. This effectively utilizes rich combinations of dilated convolutions that enlarge the network’s receptive fields with fewer parameters and features compared with the state-of-the-art approaches in the remote sensing domain. Importantly, DDCM-Net obtains fused local- and global-context information, in effect incorporating surrounding discriminative capability for multiscale and complex-shaped objects with similar color and textures in very high-resolution aerial imagery. We demonstrate the effectiveness, robustness, and flexibility of the proposed DDCM-Net on the publicly available ISPRS Potsdam and Vaihingen data sets, as well as the DeepGlobe land cover data set. Our single model, trained on three-band Potsdam and Vaihingen data sets, achieves better accuracy in terms of both mean intersection over union (mIoU) and F1-score compared with other published models trained with more than three-band data. We further validate our model on the DeepGlobe data set, achieving state-of-the-art result 56.2% mIoU with much fewer parameters and at a lower computational cost compared with related recent work. Code available at https://github.com/samleoqh/DDCM-Semantic-Segmentation-PyTorch |
Tasks | Semantic Segmentation |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04027v1 |
https://arxiv.org/pdf/2003.04027v1.pdf | |
PWC | https://paperswithcode.com/paper/dense-dilated-convolutions-merging-network-1 |
Repo | |
Framework | |
On Expansion and Contraction of DL-Lite Knowledge Bases
Title | On Expansion and Contraction of DL-Lite Knowledge Bases |
Authors | Dmitriy Zheleznyakov, Evgeny Kharlamov, Werner Nutt, Diego Calvanese |
Abstract | Knowledge bases (KBs) are not static entities: new information constantly appears and some of the previous knowledge becomes obsolete. In order to reflect this evolution of knowledge, KBs should be expanded with the new knowledge and contracted from the obsolete one. This problem is well-studied for propositional but much less for first-order KBs. In this work we investigate knowledge expansion and contraction for KBs expressed in DL-Lite, a family of description logics (DLs) that underlie the tractable fragment OWL 2 QL of the Web Ontology Language OWL 2. We start with a novel knowledge evolution framework and natural postulates that evolution should respect, and compare our postulates to the well-established AGM postulates. We then review well-known model and formula-based approaches for expansion and contraction for propositional theories and show how they can be adapted to the case of DL-Lite. In particular, we show intrinsic limitations of model-based approaches: besides the fact that some of them do not respect the postulates we have established, they ignore the structural properties of KBs. This leads to undesired properties of evolution results: evolution of DL-Lite KBs cannot be captured in DL-Lite. Moreover, we show that well-known formula-based approaches are also not appropriate for DL-Lite expansion and contraction: they either have a high complexity of computation, or they produce logical theories that cannot be expressed in DL-Lite. Thus, we propose a novel formula-based approach that respects our principles and for which evolution is expressible in DL-Lite. For this approach we also propose polynomial time deterministic algorithms to compute evolution of DL-Lite KBs when evolution affects only factual data. |
Tasks | |
Published | 2020-01-25 |
URL | https://arxiv.org/abs/2001.09365v1 |
https://arxiv.org/pdf/2001.09365v1.pdf | |
PWC | https://paperswithcode.com/paper/on-expansion-and-contraction-of-dl-lite |
Repo | |
Framework | |
Exploratory Machine Learning with Unknown Unknowns
Title | Exploratory Machine Learning with Unknown Unknowns |
Authors | Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou |
Abstract | In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to the known labels. In this paper, we study a new problem setting in which there are unknown classes in the training dataset misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates the training dataset by actively augmenting the feature space to discover potentially unknown labels. Our approach consists of three ingredients including rejection model, feature acquisition, and model cascade. The effectiveness is validated on both synthetic and real datasets. |
Tasks | |
Published | 2020-02-05 |
URL | https://arxiv.org/abs/2002.01605v1 |
https://arxiv.org/pdf/2002.01605v1.pdf | |
PWC | https://paperswithcode.com/paper/exploratory-machine-learning-with-unknown |
Repo | |
Framework | |
Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation
Title | Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation |
Authors | Xiaogang Xu, Hengshuang Zhao, Jiaya Jia |
Abstract | Adversarial training is promising for improving robustness of deep neural networks towards adversarial perturbation, especially on classification tasks. Effect of this type of training on semantic segmentation, contrarily, just commences. We make the initial attempt to explore the defense strategy on semantic segmentation by formulating a general adversarial training procedure that can perform decently on both adversarial and clean samples. We propose a dynamic divide-and-conquer adversarial training (DDC-AT) strategy to enhance the defense effect, by setting additional branches in the target model during training, and dealing with pixels with diverse properties towards adversarial perturbation. Our dynamical division mechanism divides pixels into multiple branches automatically, achieved by unsupervised learning. Note all these additional branches can be abandoned during inference and thus leave no extra parameter and computation cost. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets, in which DDC-AT yields satisfying performance under both white- and black-box attack. |
Tasks | Semantic Segmentation |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06555v1 |
https://arxiv.org/pdf/2003.06555v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-divide-and-conquer-adversarial |
Repo | |
Framework | |
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
Title | GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation |
Authors | Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang |
Abstract | Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The problem is challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by the recent progress in deep generative models, in this paper we propose a flow-based autoregressive model for graph generation called GraphAF. GraphAF combines the advantages of both autoregressive and flow-based approaches and enjoys: (1) high model flexibility for data density estimation; (2) efficient parallel computation for training; (3) an iterative sampling process, which allows leveraging chemical domain knowledge for valency checking. Experimental results show that GraphAF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of GraphAF is two times faster than the existing state-of-the-art approach GCPN. After fine-tuning the model for goal-directed property optimization with reinforcement learning, GraphAF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization. |
Tasks | Density Estimation, Drug Discovery, Graph Generation |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09382v2 |
https://arxiv.org/pdf/2001.09382v2.pdf | |
PWC | https://paperswithcode.com/paper/graphaf-a-flow-based-autoregressive-model-for-1 |
Repo | |
Framework | |
Analyzing Word Translation of Transformer Layers
Title | Analyzing Word Translation of Transformer Layers |
Authors | Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu |
Abstract | The Transformer translation model is popular for its effective parallelization and performance. Though a wide range of analysis about the Transformer has been conducted recently, the role of each Transformer layer in translation has not been studied to our knowledge. In this paper, we propose approaches to analyze the translation performed in encoder / decoder layers of the Transformer. Our approaches in general project the representations of an analyzed layer to the pre-trained classifier and measure the word translation accuracy. For the analysis of encoder layers, our approach additionally learns a weight vector to merge multiple attention matrices into one and transform the source encoding to the target side with the merged alignment matrix to align source tokens with target translations while bridging different input - output lengths. While analyzing decoder layers, we additionally study the effects of the source context and the decoding history in word prediction through bypassing the corresponding self-attention or cross-attention sub-layers. Our analysis reveals that the translation starts at the very beginning of the “encoding” (specifically at the source word embedding layer), and shows how translation evolves during the forward computation of layers. Based on observations gained in our analysis, we propose that increasing encoder depth while removing the same number of decoder layers can simply but significantly boost the decoding speed. Furthermore, simply inserting a linear projection layer before the decoder classifier which shares the weight matrix with the embedding layer can effectively provide small but consistent and significant improvements in our experiments on the WMT 14 English-German, English-French and WMT 15 Czech-English translation tasks (+0.42, +0.37 and +0.47 respectively). |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09586v1 |
https://arxiv.org/pdf/2003.09586v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-word-translation-of-transformer |
Repo | |
Framework | |
CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods
Title | CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods |
Authors | Wei Zhang, Thomas Kobber Panum, Somesh Jha, Prasad Chalasani, David Page |
Abstract | We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger causality across a wide variety of event sequences with diverse event interdependency. To address these weaknesses, we propose CAUSE (Causality from AttribUtions on Sequence of Events), a novel framework for the studied task. The key idea of CAUSE is to first implicitly capture the underlying event interdependency by fitting a neural point process, and then extract from the process a Granger causality statistic using an axiomatic attribution method. Across multiple datasets riddled with diverse event interdependency, we demonstrate that CAUSE achieves superior performance on correctly inferring the inter-type Granger causality over a range of state-of-the-art methods. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07906v1 |
https://arxiv.org/pdf/2002.07906v1.pdf | |
PWC | https://paperswithcode.com/paper/cause-learning-granger-causality-from-event |
Repo | |
Framework | |
Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
Title | Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards |
Authors | Vrettos Moulos |
Abstract | We study an extension of the classic stochastic multi-armed bandit problem which involves Markovian rewards and multiple plays. In order to tackle this problem we consider an index based adaptive allocation rule which at each stage combines calculations of sample means, and of upper confidence bounds, using the Kullback-Leibler divergence rate, for the stationary expected reward of Markovian arms. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish, asymptotically optimal, finite-time guarantees for the case of multiple plays, and IID rewards drawn from a one-parameter exponential family of probability densities. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11201v1 |
https://arxiv.org/pdf/2001.11201v1.pdf | |
PWC | https://paperswithcode.com/paper/finite-time-analysis-of-kullback-leibler |
Repo | |
Framework | |
Multivariate Functional Regression via Nested Reduced-Rank Regularization
Title | Multivariate Functional Regression via Nested Reduced-Rank Regularization |
Authors | Xiaokang Liu, Shujie Ma, Kun Chen |
Abstract | We propose a nested reduced-rank regression (NRRR) approach in fitting regression model with multivariate functional responses and predictors, to achieve tailored dimension reduction and facilitate interpretation/visualization of the resulting functional model. Our approach is based on a two-level low-rank structure imposed on the functional regression surfaces. A global low-rank structure identifies a small set of latent principal functional responses and predictors that drives the underlying regression association. A local low-rank structure then controls the complexity and smoothness of the association between the principal functional responses and predictors. Through a basis expansion approach, the functional problem boils down to an interesting integrated matrix approximation task, where the blocks or submatrices of an integrated low-rank matrix share some common row space and/or column space. An iterative algorithm with convergence guarantee is developed. We establish the consistency of NRRR and also show through non-asymptotic analysis that it can achieve at least a comparable error rate to that of the reduced-rank regression. Simulation studies demonstrate the effectiveness of NRRR. We apply NRRR in an electricity demand problem, to relate the trajectories of the daily electricity consumption with those of the daily temperatures. |
Tasks | Dimensionality Reduction |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04786v1 |
https://arxiv.org/pdf/2003.04786v1.pdf | |
PWC | https://paperswithcode.com/paper/multivariate-functional-regression-via-nested |
Repo | |
Framework | |
Convolutional Neural Networks for Sentiment Analysis in Persian Social Media
Title | Convolutional Neural Networks for Sentiment Analysis in Persian Social Media |
Authors | Morteza Rohanian, Mostafa Salehi, Ali Darzi, Vahid Ranjbar |
Abstract | With the social media engagement on the rise, the resulting data can be used as a rich resource for analyzing and understanding different phenomena around us. A sentiment analysis system employs these data to find the attitude of social media users towards certain entities in a given document. In this paper we propose a sentiment analysis method for Persian text using Convolutional Neural Network (CNN), a feedforward Artificial Neural Network, that categorize sentences into two and five classes (considering their intensity) by applying a layer of convolution over input data through different filters. We evaluated the method on three different datasets of Persian social media texts using Area under Curve metric. The final results show the advantage of using CNN over earlier attempts at developing traditional machine learning methods for Persian texts sentiment classification especially for short texts. |
Tasks | Sentiment Analysis |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.06233v1 |
https://arxiv.org/pdf/2002.06233v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-networks-for-sentiment-1 |
Repo | |
Framework | |
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Title | Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning |
Authors | Nathan Kallus, Angela Zhou |
Abstract | Off-policy evaluation of sequential decision policies from observational data is necessary in applications of batch reinforcement learning such as education and healthcare. In such settings, however, observed actions are often confounded with transitions by unobserved variables, rendering exact evaluation of new policies impossible, i.e., unidentifiable. We develop a robust approach that estimates sharp bounds on the (unidentifiable) value of a given policy in an infinite-horizon problem given data from another policy with unobserved confounding subject to a sensitivity model. We phrase the problem precisely as computing the support function of the set of all stationary state-occupancy ratios that agree with both the data and the sensitivity model. We show how to express this set using a new partially identified estimating equation and prove convergence to the sharp bounds, as we collect more confounded data. We prove that membership in the set can be checked by solving a linear program, while the support function is given by a difficult nonconvex optimization problem. We leverage an analytical solution for the finite-state-space case to develop approximations based on nonconvex projected gradient descent. We demonstrate the resulting bounds empirically. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04518v1 |
https://arxiv.org/pdf/2002.04518v1.pdf | |
PWC | https://paperswithcode.com/paper/confounding-robust-policy-evaluation-in |
Repo | |
Framework | |
Improving the Performance of Stochastic Local Search for Maximum Vertex Weight Clique Problem Using Programming by Optimization
Title | Improving the Performance of Stochastic Local Search for Maximum Vertex Weight Clique Problem Using Programming by Optimization |
Authors | Yi Chu, Chuan Luo, Holger H. Hoos, QIngwei Lin, Haihang You |
Abstract | The maximum vertex weight clique problem (MVWCP) is an important generalization of the maximum clique problem (MCP) that has a wide range of real-world applications. In situations where rigorous guarantees regarding the optimality of solutions are not required, MVWCP is usually solved using stochastic local search (SLS) algorithms, which also define the state of the art for solving this problem. However, there is no single SLS algorithm which gives the best performance across all classes of MVWCP instances, and it is challenging to effectively identify the most suitable algorithm for each class of MVWCP instances. In this work, we follow the paradigm of Programming by Optimization (PbO) to develop a new, flexible and highly parametric SLS framework for solving MVWCP, combining, for the first time, a broad range of effective heuristic mechanisms. By automatically configuring this PbO-MWC framework, we achieve substantial advances in the state-of-the-art in solving MVWCP over a broad range of prominent benchmarks, including two derived from real-world applications in transplantation medicine (kidney exchange) and assessment of research excellence. |
Tasks | |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.11909v1 |
https://arxiv.org/pdf/2002.11909v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-performance-of-stochastic-local |
Repo | |
Framework | |
ABCTracker: an easy-to-use, cloud-based application for tracking multiple objects
Title | ABCTracker: an easy-to-use, cloud-based application for tracking multiple objects |
Authors | Lance Rice, Samual Tate, David Farynyk, Joshua Sun, Greg Chism, Daniel Charbonneau, Thomas Fasciano, Anna Dornhaus, Min C. Shin |
Abstract | Visual multi-object tracking has the potential to accelerate many forms of quantitative analyses, especially in research communities investigating the motion, behavior, or social interactions within groups of animals. Despite its potential for increasing analysis throughput, complications related to accessibility, adaptability, accuracy, or scalable application arise with existing tracking systems. Several iterations of prototyping and testing have led us to a multi-object tracking system – ABCTracker – that is: accessible in both system as well as technical knowledge requirements, easily adaptable to new videos, and capable of producing accurate tracking data through a mixture of automatic and semi-automatic tracking features. |
Tasks | Multi-Object Tracking, Object Tracking |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.10072v2 |
https://arxiv.org/pdf/2001.10072v2.pdf | |
PWC | https://paperswithcode.com/paper/abctracker-an-easy-to-use-cloud-based |
Repo | |
Framework | |