January 28, 2020

3337 words 16 mins read

Paper Group ANR 896

FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks. Gradient Perturbation is Underrated for Differentially Private Convex Optimization. Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data. Eavesdrop the Composition Proportion of Training Labels in Federated Learning. Contamin …

FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks


Title	FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks
Authors	Mahum Naseer, Mishal Fatima Minhas, Faiq Khalid, Muhammad Abdullah Hanif, Osman Hasan, Muhammad Shafique
Abstract	With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on “known inputs”, these NNs can fail absurdly on the “unseen inputs”, especially if these real-time inputs deviate from the training dataset distributions, or contain certain types of input noise. This indicates the low noise tolerance of NNs, which is a major reason for the recent increase of adversarial attacks. This is a serious concern, particularly for safety-critical applications, where inaccurate results lead to dire consequences. We propose a novel methodology that leverages model checking for the Formal Analysis of Neural Network (FANNet) under different input noise ranges. Our methodology allows us to rigorously analyze the noise tolerance of NNs, their input node sensitivity, and the effects of training bias on their performance, e.g., in terms of classification accuracy. For evaluation, we use a feed-forward fully-connected NN architecture trained for the Leukemia classification. Our experimental results show $\pm 11%$ noise tolerance for the given trained network, identify the most sensitive input nodes, and confirm the biasness of the available training dataset.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01978v1
PDF	https://arxiv.org/pdf/1912.01978v1.pdf
PWC	https://paperswithcode.com/paper/fannet-formal-analysis-of-noise-tolerance
Repo
Framework

Gradient Perturbation is Underrated for Differentially Private Convex Optimization


Title	Gradient Perturbation is Underrated for Differentially Private Convex Optimization
Authors	Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin
Abstract	Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy. Previous work first determines the noise level that can satisfy the privacy requirement and then analyzes the utility of noisy gradient updates as in non-private case. In this paper, we explore how the privacy noise affects the optimization property. We show that for differentially private convex optimization, the utility guarantee of both DP-GD and DP-SGD is determined by an \emph{expected curvature} rather than the minimum curvature. The \emph{expected curvature} represents the average curvature over the optimization path, which is usually much larger than the minimum curvature and hence can help us achieve a significantly improved utility guarantee. By using the \emph{expected curvature}, our theory justifies the advantage of gradient perturbation over other perturbation methods and closes the gap between theory and practice. Extensive experiments on real world datasets corroborate our theoretical findings.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11363v1
PDF	https://arxiv.org/pdf/1911.11363v1.pdf
PWC	https://paperswithcode.com/paper/gradient-perturbation-is-underrated-for-1
Repo
Framework

Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data


Title	Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data
Authors	Yiming Xu, Dnyanesh Rajpathak, Ian Gibbs, Diego Klabjan
Abstract	Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-stage classification system to automatically learn an ontology from unstructured text data. We first collect candidate concepts, which are classified into concepts and irrelevant collocates by our first classifier. The concepts from the first classifier are further classified by the second classifier into different concept types. The proposed system is deployed as a prototype at a company and its performance is validated by using complaint and repair verbatim data collected in automotive industry from different data sources.
Tasks	Information Retrieval
Published	2019-03-07
URL	http://arxiv.org/abs/1903.04360v1
PDF	http://arxiv.org/pdf/1903.04360v1.pdf
PWC	https://paperswithcode.com/paper/automatic-ontology-learning-from-domain
Repo
Framework

Eavesdrop the Composition Proportion of Training Labels in Federated Learning


Title	Eavesdrop the Composition Proportion of Training Labels in Federated Learning
Authors	Lixu Wang, Shichao Xu, Xiao Wang, Qi Zhu
Abstract	Federated learning (FL) has recently emerged as a new form of collaborative machine learning, where a common model can be learned while keeping all the training data on local devices. Although it is designed for enhancing the data privacy, we demonstrated in this paper a new direction in inference attacks in the context of FL, where valuable information about training data can be obtained by adversaries with very limited power. In particular, we proposed three new types of attacks to exploit this vulnerability. The first type of attack, Class Sniffing, can detect whether a certain label appears in training. The other two types of attacks can determine the quantity of each label, i.e., Quantity Inference attack determines the composition proportion of the training label owned by the selected clients in a single round, while Whole Determination attack determines that of the whole training process. We evaluated our attacks on a variety of tasks and datasets with different settings, and the corresponding results showed that our attacks work well generally. Finally, we analyzed the impact of major hyper-parameters to our attacks and discussed possible defenses.
Tasks	Inference Attack
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06044v2
PDF	https://arxiv.org/pdf/1910.06044v2.pdf
PWC	https://paperswithcode.com/paper/eavesdrop-the-composition-proportion-of
Repo
Framework

Contamination Attacks and Mitigation in Multi-Party Machine Learning


Title	Contamination Attacks and Mitigation in Multi-Party Machine Learning
Authors	Jamie Hayes, Olga Ohrimenko
Abstract	Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time. Distinct parties may want to combine their local data to gain the benefits of a model trained on a large corpus of data. We consider such a case: parties get access to the model trained on their joint data but do not see each others individual datasets. We show that one needs to be careful when using this multi-party model since a potentially malicious party can taint the model by providing contaminated data. We then show how adversarial training can defend against such attacks by preventing the model from learning trends specific to individual parties data, thereby also guaranteeing party-level membership privacy.
Tasks
Published	2019-01-08
URL	http://arxiv.org/abs/1901.02402v1
PDF	http://arxiv.org/pdf/1901.02402v1.pdf
PWC	https://paperswithcode.com/paper/contamination-attacks-and-mitigation-in-multi
Repo
Framework

Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions


Title	Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions
Authors	Joey Hong, Benjamin Sapp, James Philbin
Abstract	We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has used low-level signals to predict short time horizons, and has not addressed how to leverage key assets relied upon heavily by industry self-driving systems: (1) large 3D perception efforts which provide highly accurate 3D states of agents with rich attributes, and (2) detailed and accurate semantic maps of the environment (lanes, traffic lights, crosswalks, etc). We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. This enables learning entity-entity and entity-environment interactions with simple, feed-forward computations in each timestep within an overall temporal model of an agent’s behavior. We propose different ways of modelling the future as a distribution over future states using standard supervised learning. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior.
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.08945v1
PDF	https://arxiv.org/pdf/1906.08945v1.pdf
PWC	https://paperswithcode.com/paper/rules-of-the-road-predicting-driving-behavior-1
Repo
Framework

Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study


Title	Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study
Authors	Honghan Wu, Karen Hodgson, Sue Dyson, Katherine I. Morley, Zina M. Ibrahim, Ehtesham Iqbal, Robert Stewart, Richard JB Dobson, Cathie Sudlow
Abstract	Background: Many efforts have been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to construct comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. Objective: The aim of this work is to minimize the effort involved in reusing NLP models on free-text medical records. Methods: We formally define and analyse the model adaptation problem in phenotype-mention identification tasks. We identify “duplicate waste” and “imbalance waste”, which collectively impede efficient model reuse. We propose a phenotype embedding based approach to minimize these sources of waste without the need for labelled data from new settings. Results: We conduct experiments on data from a large mental health registry to reuse NLP models in four phenotype-mention identification tasks. The proposed approach can choose the best model for a new task, identifying up to 76% (duplicate waste), i.e. phenotype mentions without the need for validation and model retraining, and with very good performance (93-97% accuracy). It can also provide guidance for validating and retraining the selected model for novel language patterns in new tasks, saving around 80% (imbalance waste), i.e. the effort required in “blind” model-adaptation approaches. Conclusions: Adapting pre-trained NLP models for new tasks can be more efficient and effective if the language pattern landscapes of old settings and new settings can be made explicit and comparable. Our experiments show that the phenotype-mention embedding approach is an effective way to model language patterns for phenotype-mention identification tasks and that its use can guide efficient NLP model reuse.
Tasks
Published	2019-03-10
URL	https://arxiv.org/abs/1903.03995v3
PDF	https://arxiv.org/pdf/1903.03995v3.pdf
PWC	https://paperswithcode.com/paper/contextualised-concept-embedding-for
Repo
Framework

Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs


Title	Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Authors	Haisheng Fu, Feng Liang, Bo Lei, Nai Bian, Qian zhang, Mohammad Akbari, Jie Liang, Chengjie Tu
Abstract	Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain.
Tasks	Image Compression
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06566v1
PDF	https://arxiv.org/pdf/1907.06566v1.pdf
PWC	https://paperswithcode.com/paper/improved-hybrid-layered-image-compression
Repo
Framework

A Tight Analysis of Greedy Yields Subexponential Time Approximation for Uniform Decision Tree


Title	A Tight Analysis of Greedy Yields Subexponential Time Approximation for Uniform Decision Tree
Authors	Ray Li, Percy Liang, Stephen Mussmann
Abstract	Decision Tree is a classic formulation of active learning: given $n$ hypotheses with nonnegative weights summing to 1 and a set of tests that each partition the hypotheses, output a decision tree using the provided tests that uniquely identifies each hypothesis and has minimum (weighted) average depth. Previous works showed that the greedy algorithm achieves a $O(\log n)$ approximation ratio for this problem and it is NP-hard beat a $O(\log n)$ approximation, settling the complexity of the problem. However, for Uniform Decision Tree, i.e. Decision Tree with uniform weights, the story is more subtle. The greedy algorithm’s $O(\log n)$ approximation ratio was the best known, but the largest approximation ratio known to be NP-hard is $4-\varepsilon$. We prove that the greedy algorithm gives a $O(\frac{\log n}{\log C_{OPT}})$ approximation for Uniform Decision Tree, where $C_{OPT}$ is the cost of the optimal tree and show this is best possible for the greedy algorithm. As a corollary, we resolve a conjecture of Kosaraju, Przytycka, and Borgstrom. Leveraging this result, for all $\alpha\in(0,1)$, we exhibit a $\frac{9.01}{\alpha}$ approximation algorithm to Uniform Decision Tree running in subexponential time $2^{\tilde O(n^\alpha)}$. As a corollary, achieving any super-constant approximation ratio on Uniform Decision Tree is not NP-hard, assuming the Exponential Time Hypothesis. This work therefore adds approximating Uniform Decision Tree to a small list of natural problems that have subexponential time algorithms but no known polynomial time algorithms. All our results hold for Decision Tree with weights not too far from uniform. A key technical contribution of our work is showing a connection between greedy algorithms for Uniform Decision Tree and for Min Sum Set Cover.
Tasks	Active Learning
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11385v2
PDF	https://arxiv.org/pdf/1906.11385v2.pdf
PWC	https://paperswithcode.com/paper/a-tight-analysis-of-greedy-yields
Repo
Framework

Multi-Year Vector Dynamic Time Warping Based Crop Mapping


Title	Multi-Year Vector Dynamic Time Warping Based Crop Mapping
Authors	Mustafa Teke, Yasemin Yardımcı
Abstract	Recent automated crop mapping via supervised learning-based methods have demonstrated unprecedented improvement over classical techniques. However, most crop mapping studies are limited to same-year crop mapping in which the present year’s labeled data is used to predict the same year’s crop map. Classification accuracies of these methods degrade considerably in cross-year mapping. Cross-year crop mapping is more useful as it allows the prediction of the following years’ crop maps using previously labeled data. We propose Vector Dynamic Time Warping (VDTW), a novel multi-year classification approach based on warping of angular distances between phenological vectors. The results prove that the proposed VDTW method is robust to temporal and spectral variations compensating for different farming practices, climate and atmospheric effects, and measurement errors between years. We also describe a method for determining the most discriminative time window that allows high classification accuracies with limited data. We carried out tests of our approach with Landsat 8 time-series imagery from years 2013 to 2016 for classification of corn and cotton in the Harran Plain, and corn, cotton, and soybean in the Bismil Plain of Southeastern Turkey. In addition, we tested VDTW corn and soybean in Kansas, the US for 2017 and 2018 with the Harmonized Landsat Sentinel data. The VDTW method achieved 99.85% and 99.74% overall accuracies for the same and cross years, respectively with fewer training samples compared to other state-of-the-art approaches, i.e. spectral angle mapper (SAM), dynamic time warping (DTW), time-weighted DTW (TWDTW), random forest (RF), support vector machine (SVM) and deep long short-term memory (LSTM) methods. The proposed method could be expanded for other crop types and/or geographical areas.
Tasks	Time Series
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04930v2
PDF	https://arxiv.org/pdf/1909.04930v2.pdf
PWC	https://paperswithcode.com/paper/multi-year-vector-dynamic-time-warping-based
Repo
Framework

Semi-supervised Wrapper Feature Selection by Modeling Imperfect Labels


Title	Semi-supervised Wrapper Feature Selection by Modeling Imperfect Labels
Authors	Vasilii Feofanov, Emilie Devijver, Massih-Reza Amini
Abstract	In this paper, we propose a new wrapper feature selection approach with partially labeled training examples where unlabeled observations are pseudo-labeled using the predictions of an initial classifier trained on the labeled training set. The wrapper is composed of a genetic algorithm for proposing new feature subsets, and an evaluation measure for scoring the different feature subsets. The selection of feature subsets is done by assigning weights to characteristics and recursively eliminating those that are irrelevant. The selection criterion is based on a new multi-class $\mathcal{C}$-bound that explicitly takes into account the mislabeling errors induced by the pseudo-labeling mechanism, using a probabilistic error model. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised feature selection approaches.
Tasks	Feature Selection
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04841v2
PDF	https://arxiv.org/pdf/1911.04841v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-wrapper-feature-selection
Repo
Framework

Imposing edges in Minimum Spanning Tree


Title	Imposing edges in Minimum Spanning Tree
Authors	Nicolas Isoart, Jean-Charles Régin
Abstract	We are interested in the consequences of imposing edges in $T$ a minimum spanning tree. We prove that the sum of the replacement costs in $T$ of the imposed edges is a lower bounds of the additional costs. More precisely if r-cost$(T,e)$ is the replacement cost of the edge $e$, we prove that if we impose a set $I$ of nontree edges of $T$ then $\sum_{e \in I} $ r-cost$(T,e) \leq$ cost$(T_{e \in I})$, where $I$ is the set of imposed edges and $T_{e \in I}$ a minimum spanning tree containing all the edges of $I$.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09360v1
PDF	https://arxiv.org/pdf/1912.09360v1.pdf
PWC	https://paperswithcode.com/paper/imposing-edges-in-minimum-spanning-tree
Repo
Framework

DPM: A deep learning PDE augmentation method (with application to large-eddy simulation)


Title	DPM: A deep learning PDE augmentation method (with application to large-eddy simulation)
Authors	Jonathan B. Freund, Jonathan F. MacArt, Justin Sirignano
Abstract	Machine learning for scientific applications faces the challenge of limited data. We propose a framework that leverages a priori known physics to reduce overfitting when training on relatively small datasets. A deep neural network is embedded in a partial differential equation (PDE) that expresses the known physics and learns to describe the corresponding unknown or unrepresented physics from the data. Crafted as such, the neural network can also provide corrections for erroneously represented physics, such as discretization errors associated with the PDE’s numerical solution. Once trained, the deep learning PDE model (DPM) can make out-of-sample predictions for new physical parameters, geometries, and boundary conditions. Our approach optimizes over the functional form of the PDE. Estimating the embedded neural network requires optimizing over the entire PDE, which itself is a function of the neural network. Adjoint partial differential equations are used to efficiently calculate the high-dimensional gradient of the objective function with respect to the neural network parameters. A stochastic adjoint method (SAM), similar in spirit to stochastic gradient descent, further accelerates training. The approach is demonstrated and evaluated for turbulence predictions using large-eddy simulation (LES), a filtered version of the Navier–Stokes equation containing unclosed sub-filter-scale terms. The DPM outperforms the widely-used constant-coefficient and dynamic Smagorinsky models, even for filter sizes so large that these established models become qualitatively incorrect. It also significantly outperforms a priori trained models, which do not account for the full PDE. A relaxation of the discrete enforcement of the divergence-free constraint is also considered, instead allowing the DPM to approximately enforce incompressibility physics.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.09145v1
PDF	https://arxiv.org/pdf/1911.09145v1.pdf
PWC	https://paperswithcode.com/paper/dpm-a-deep-learning-pde-augmentation-method
Repo
Framework

Cascaded Volumetric Convolutional Network for Kidney Tumor Segmentation from CT volumes


Title	Cascaded Volumetric Convolutional Network for Kidney Tumor Segmentation from CT volumes
Authors	Yao Zhang, Yixin Wang, Feng Hou, Jiawei Yang, Guangwei Xiong, Jiang Tian, Cheng Zhong
Abstract	Automated segmentation of kidney and tumor from 3D CT scans is necessary for the diagnosis, monitoring, and treatment planning of the disease. In this paper, we describe a two-stage framework for kidney and tumor segmentation based on 3D fully convolutional network (FCN). The first stage preliminarily locate the kidney and cut off the irrelevant background to reduce class imbalance and computation cost. Then the second stage precisely segment the kidney and tumor on the cropped patch. The proposed method achieves 98.05% and 83.70% of Dice score on the validation set of MICCAI 2019 KiTS Challenge.
Tasks
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02235v1
PDF	https://arxiv.org/pdf/1910.02235v1.pdf
PWC	https://paperswithcode.com/paper/cascaded-volumetric-convolutional-network-for
Repo
Framework

Multi-agent Attentional Activity Recognition


Title	Multi-agent Attentional Activity Recognition
Authors	Kaixuan Chen, Lina Yao, Dalin Zhang, Bin Guo, Zhiwen Yu
Abstract	Multi-modality is an important feature of sensor based activity recognition. In this work, we consider two inherent characteristics of human activities, the spatially-temporally varying salience of features and the relations between activities and corresponding body part motions. Based on these, we propose a multi-agent spatial-temporal attention model. The spatial-temporal attention mechanism helps intelligently select informative modalities and their active periods. And the multiple agents in the proposed model represent activities with collective motions across body parts by independently selecting modalities associated with single motions. With a joint recognition goal, the agents share gained information and coordinate their selection policies to learn the optimal recognition model. The experimental results on four real-world datasets demonstrate that the proposed model outperforms the state-of-the-art methods.
Tasks	Activity Recognition
Published	2019-05-22
URL	https://arxiv.org/abs/1905.08948v1
PDF	https://arxiv.org/pdf/1905.08948v1.pdf
PWC	https://paperswithcode.com/paper/multi-agent-attentional-activity-recognition
Repo
Framework