Paper Group ANR 896
FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks. Gradient Perturbation is Underrated for Differentially Private Convex Optimization. Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data. Eavesdrop the Composition Proportion of Training Labels in Federated Learning. Contamin …
FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks
Title | FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks |
Authors | Mahum Naseer, Mishal Fatima Minhas, Faiq Khalid, Muhammad Abdullah Hanif, Osman Hasan, Muhammad Shafique |
Abstract | With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on “known inputs”, these NNs can fail absurdly on the “unseen inputs”, especially if these real-time inputs deviate from the training dataset distributions, or contain certain types of input noise. This indicates the low noise tolerance of NNs, which is a major reason for the recent increase of adversarial attacks. This is a serious concern, particularly for safety-critical applications, where inaccurate results lead to dire consequences. We propose a novel methodology that leverages model checking for the Formal Analysis of Neural Network (FANNet) under different input noise ranges. Our methodology allows us to rigorously analyze the noise tolerance of NNs, their input node sensitivity, and the effects of training bias on their performance, e.g., in terms of classification accuracy. For evaluation, we use a feed-forward fully-connected NN architecture trained for the Leukemia classification. Our experimental results show $\pm 11%$ noise tolerance for the given trained network, identify the most sensitive input nodes, and confirm the biasness of the available training dataset. |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01978v1 |
https://arxiv.org/pdf/1912.01978v1.pdf | |
PWC | https://paperswithcode.com/paper/fannet-formal-analysis-of-noise-tolerance |
Repo | |
Framework | |
Gradient Perturbation is Underrated for Differentially Private Convex Optimization
Title | Gradient Perturbation is Underrated for Differentially Private Convex Optimization |
Authors | Da Yu, Huishuai Zhang, Wei Chen, Tie-Yan Liu, Jian Yin |
Abstract | Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy. Previous work first determines the noise level that can satisfy the privacy requirement and then analyzes the utility of noisy gradient updates as in non-private case. In this paper, we explore how the privacy noise affects the optimization property. We show that for differentially private convex optimization, the utility guarantee of both DP-GD and DP-SGD is determined by an \emph{expected curvature} rather than the minimum curvature. The \emph{expected curvature} represents the average curvature over the optimization path, which is usually much larger than the minimum curvature and hence can help us achieve a significantly improved utility guarantee. By using the \emph{expected curvature}, our theory justifies the advantage of gradient perturbation over other perturbation methods and closes the gap between theory and practice. Extensive experiments on real world datasets corroborate our theoretical findings. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11363v1 |
https://arxiv.org/pdf/1911.11363v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-perturbation-is-underrated-for-1 |
Repo | |
Framework | |
Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data
Title | Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data |
Authors | Yiming Xu, Dnyanesh Rajpathak, Ian Gibbs, Diego Klabjan |
Abstract | Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-stage classification system to automatically learn an ontology from unstructured text data. We first collect candidate concepts, which are classified into concepts and irrelevant collocates by our first classifier. The concepts from the first classifier are further classified by the second classifier into different concept types. The proposed system is deployed as a prototype at a company and its performance is validated by using complaint and repair verbatim data collected in automotive industry from different data sources. |
Tasks | Information Retrieval |
Published | 2019-03-07 |
URL | http://arxiv.org/abs/1903.04360v1 |
http://arxiv.org/pdf/1903.04360v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-ontology-learning-from-domain |
Repo | |
Framework | |
Eavesdrop the Composition Proportion of Training Labels in Federated Learning
Title | Eavesdrop the Composition Proportion of Training Labels in Federated Learning |
Authors | Lixu Wang, Shichao Xu, Xiao Wang, Qi Zhu |
Abstract | Federated learning (FL) has recently emerged as a new form of collaborative machine learning, where a common model can be learned while keeping all the training data on local devices. Although it is designed for enhancing the data privacy, we demonstrated in this paper a new direction in inference attacks in the context of FL, where valuable information about training data can be obtained by adversaries with very limited power. In particular, we proposed three new types of attacks to exploit this vulnerability. The first type of attack, Class Sniffing, can detect whether a certain label appears in training. The other two types of attacks can determine the quantity of each label, i.e., Quantity Inference attack determines the composition proportion of the training label owned by the selected clients in a single round, while Whole Determination attack determines that of the whole training process. We evaluated our attacks on a variety of tasks and datasets with different settings, and the corresponding results showed that our attacks work well generally. Finally, we analyzed the impact of major hyper-parameters to our attacks and discussed possible defenses. |
Tasks | Inference Attack |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06044v2 |
https://arxiv.org/pdf/1910.06044v2.pdf | |
PWC | https://paperswithcode.com/paper/eavesdrop-the-composition-proportion-of |
Repo | |
Framework | |
Contamination Attacks and Mitigation in Multi-Party Machine Learning
Title | Contamination Attacks and Mitigation in Multi-Party Machine Learning |
Authors | Jamie Hayes, Olga Ohrimenko |
Abstract | Machine learning is data hungry; the more data a model has access to in training, the more likely it is to perform well at inference time. Distinct parties may want to combine their local data to gain the benefits of a model trained on a large corpus of data. We consider such a case: parties get access to the model trained on their joint data but do not see each others individual datasets. We show that one needs to be careful when using this multi-party model since a potentially malicious party can taint the model by providing contaminated data. We then show how adversarial training can defend against such attacks by preventing the model from learning trends specific to individual parties data, thereby also guaranteeing party-level membership privacy. |
Tasks | |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02402v1 |
http://arxiv.org/pdf/1901.02402v1.pdf | |
PWC | https://paperswithcode.com/paper/contamination-attacks-and-mitigation-in-multi |
Repo | |
Framework | |
Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions
Title | Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions |
Authors | Joey Hong, Benjamin Sapp, James Philbin |
Abstract | We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has used low-level signals to predict short time horizons, and has not addressed how to leverage key assets relied upon heavily by industry self-driving systems: (1) large 3D perception efforts which provide highly accurate 3D states of agents with rich attributes, and (2) detailed and accurate semantic maps of the environment (lanes, traffic lights, crosswalks, etc). We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. This enables learning entity-entity and entity-environment interactions with simple, feed-forward computations in each timestep within an overall temporal model of an agent’s behavior. We propose different ways of modelling the future as a distribution over future states using standard supervised learning. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior. |
Tasks | |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.08945v1 |
https://arxiv.org/pdf/1906.08945v1.pdf | |
PWC | https://paperswithcode.com/paper/rules-of-the-road-predicting-driving-behavior-1 |
Repo | |
Framework | |
Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study
Title | Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study |
Authors | Honghan Wu, Karen Hodgson, Sue Dyson, Katherine I. Morley, Zina M. Ibrahim, Ehtesham Iqbal, Robert Stewart, Richard JB Dobson, Cathie Sudlow |
Abstract | Background: Many efforts have been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to construct comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. Objective: The aim of this work is to minimize the effort involved in reusing NLP models on free-text medical records. Methods: We formally define and analyse the model adaptation problem in phenotype-mention identification tasks. We identify “duplicate waste” and “imbalance waste”, which collectively impede efficient model reuse. We propose a phenotype embedding based approach to minimize these sources of waste without the need for labelled data from new settings. Results: We conduct experiments on data from a large mental health registry to reuse NLP models in four phenotype-mention identification tasks. The proposed approach can choose the best model for a new task, identifying up to 76% (duplicate waste), i.e. phenotype mentions without the need for validation and model retraining, and with very good performance (93-97% accuracy). It can also provide guidance for validating and retraining the selected model for novel language patterns in new tasks, saving around 80% (imbalance waste), i.e. the effort required in “blind” model-adaptation approaches. Conclusions: Adapting pre-trained NLP models for new tasks can be more efficient and effective if the language pattern landscapes of old settings and new settings can be made explicit and comparable. Our experiments show that the phenotype-mention embedding approach is an effective way to model language patterns for phenotype-mention identification tasks and that its use can guide efficient NLP model reuse. |
Tasks | |
Published | 2019-03-10 |
URL | https://arxiv.org/abs/1903.03995v3 |
https://arxiv.org/pdf/1903.03995v3.pdf | |
PWC | https://paperswithcode.com/paper/contextualised-concept-embedding-for |
Repo | |
Framework | |
Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs
Title | Improved Hybrid Layered Image Compression using Deep Learning and Traditional Codecs |
Authors | Haisheng Fu, Feng Liang, Bo Lei, Nai Bian, Qian zhang, Mohammad Akbari, Jie Liang, Chengjie Tu |
Abstract | Recently deep learning-based methods have been applied in image compression and achieved many promising results. In this paper, we propose an improved hybrid layered image compression framework by combining deep learning and the traditional image codecs. At the encoder, we first use a convolutional neural network (CNN) to obtain a compact representation of the input image, which is losslessly encoded by the FLIF codec as the base layer of the bit stream. A coarse reconstruction of the input is obtained by another CNN from the reconstructed compact representation. The residual between the input and the coarse reconstruction is then obtained and encoded by the H.265/HEVC-based BPG codec as the enhancement layer of the bit stream. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms the state-of-the-art deep learning-based layered coding scheme and traditional codecs including BPG in both PSNR and MS-SSIM metrics across a wide range of bit rates, when the images are coded in the RGB444 domain. |
Tasks | Image Compression |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06566v1 |
https://arxiv.org/pdf/1907.06566v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-hybrid-layered-image-compression |
Repo | |
Framework | |
A Tight Analysis of Greedy Yields Subexponential Time Approximation for Uniform Decision Tree
Title | A Tight Analysis of Greedy Yields Subexponential Time Approximation for Uniform Decision Tree |
Authors | Ray Li, Percy Liang, Stephen Mussmann |
Abstract | Decision Tree is a classic formulation of active learning: given $n$ hypotheses with nonnegative weights summing to 1 and a set of tests that each partition the hypotheses, output a decision tree using the provided tests that uniquely identifies each hypothesis and has minimum (weighted) average depth. Previous works showed that the greedy algorithm achieves a $O(\log n)$ approximation ratio for this problem and it is NP-hard beat a $O(\log n)$ approximation, settling the complexity of the problem. However, for Uniform Decision Tree, i.e. Decision Tree with uniform weights, the story is more subtle. The greedy algorithm’s $O(\log n)$ approximation ratio was the best known, but the largest approximation ratio known to be NP-hard is $4-\varepsilon$. We prove that the greedy algorithm gives a $O(\frac{\log n}{\log C_{OPT}})$ approximation for Uniform Decision Tree, where $C_{OPT}$ is the cost of the optimal tree and show this is best possible for the greedy algorithm. As a corollary, we resolve a conjecture of Kosaraju, Przytycka, and Borgstrom. Leveraging this result, for all $\alpha\in(0,1)$, we exhibit a $\frac{9.01}{\alpha}$ approximation algorithm to Uniform Decision Tree running in subexponential time $2^{\tilde O(n^\alpha)}$. As a corollary, achieving any super-constant approximation ratio on Uniform Decision Tree is not NP-hard, assuming the Exponential Time Hypothesis. This work therefore adds approximating Uniform Decision Tree to a small list of natural problems that have subexponential time algorithms but no known polynomial time algorithms. All our results hold for Decision Tree with weights not too far from uniform. A key technical contribution of our work is showing a connection between greedy algorithms for Uniform Decision Tree and for Min Sum Set Cover. |
Tasks | Active Learning |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11385v2 |
https://arxiv.org/pdf/1906.11385v2.pdf | |
PWC | https://paperswithcode.com/paper/a-tight-analysis-of-greedy-yields |
Repo | |
Framework | |
Multi-Year Vector Dynamic Time Warping Based Crop Mapping
Title | Multi-Year Vector Dynamic Time Warping Based Crop Mapping |
Authors | Mustafa Teke, Yasemin Yardımcı |
Abstract | Recent automated crop mapping via supervised learning-based methods have demonstrated unprecedented improvement over classical techniques. However, most crop mapping studies are limited to same-year crop mapping in which the present year’s labeled data is used to predict the same year’s crop map. Classification accuracies of these methods degrade considerably in cross-year mapping. Cross-year crop mapping is more useful as it allows the prediction of the following years’ crop maps using previously labeled data. We propose Vector Dynamic Time Warping (VDTW), a novel multi-year classification approach based on warping of angular distances between phenological vectors. The results prove that the proposed VDTW method is robust to temporal and spectral variations compensating for different farming practices, climate and atmospheric effects, and measurement errors between years. We also describe a method for determining the most discriminative time window that allows high classification accuracies with limited data. We carried out tests of our approach with Landsat 8 time-series imagery from years 2013 to 2016 for classification of corn and cotton in the Harran Plain, and corn, cotton, and soybean in the Bismil Plain of Southeastern Turkey. In addition, we tested VDTW corn and soybean in Kansas, the US for 2017 and 2018 with the Harmonized Landsat Sentinel data. The VDTW method achieved 99.85% and 99.74% overall accuracies for the same and cross years, respectively with fewer training samples compared to other state-of-the-art approaches, i.e. spectral angle mapper (SAM), dynamic time warping (DTW), time-weighted DTW (TWDTW), random forest (RF), support vector machine (SVM) and deep long short-term memory (LSTM) methods. The proposed method could be expanded for other crop types and/or geographical areas. |
Tasks | Time Series |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04930v2 |
https://arxiv.org/pdf/1909.04930v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-year-vector-dynamic-time-warping-based |
Repo | |
Framework | |
Semi-supervised Wrapper Feature Selection by Modeling Imperfect Labels
Title | Semi-supervised Wrapper Feature Selection by Modeling Imperfect Labels |
Authors | Vasilii Feofanov, Emilie Devijver, Massih-Reza Amini |
Abstract | In this paper, we propose a new wrapper feature selection approach with partially labeled training examples where unlabeled observations are pseudo-labeled using the predictions of an initial classifier trained on the labeled training set. The wrapper is composed of a genetic algorithm for proposing new feature subsets, and an evaluation measure for scoring the different feature subsets. The selection of feature subsets is done by assigning weights to characteristics and recursively eliminating those that are irrelevant. The selection criterion is based on a new multi-class $\mathcal{C}$-bound that explicitly takes into account the mislabeling errors induced by the pseudo-labeling mechanism, using a probabilistic error model. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised feature selection approaches. |
Tasks | Feature Selection |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04841v2 |
https://arxiv.org/pdf/1911.04841v2.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-wrapper-feature-selection |
Repo | |
Framework | |
Imposing edges in Minimum Spanning Tree
Title | Imposing edges in Minimum Spanning Tree |
Authors | Nicolas Isoart, Jean-Charles Régin |
Abstract | We are interested in the consequences of imposing edges in $T$ a minimum spanning tree. We prove that the sum of the replacement costs in $T$ of the imposed edges is a lower bounds of the additional costs. More precisely if r-cost$(T,e)$ is the replacement cost of the edge $e$, we prove that if we impose a set $I$ of nontree edges of $T$ then $\sum_{e \in I} $ r-cost$(T,e) \leq$ cost$(T_{e \in I})$, where $I$ is the set of imposed edges and $T_{e \in I}$ a minimum spanning tree containing all the edges of $I$. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09360v1 |
https://arxiv.org/pdf/1912.09360v1.pdf | |
PWC | https://paperswithcode.com/paper/imposing-edges-in-minimum-spanning-tree |
Repo | |
Framework | |
DPM: A deep learning PDE augmentation method (with application to large-eddy simulation)
Title | DPM: A deep learning PDE augmentation method (with application to large-eddy simulation) |
Authors | Jonathan B. Freund, Jonathan F. MacArt, Justin Sirignano |
Abstract | Machine learning for scientific applications faces the challenge of limited data. We propose a framework that leverages a priori known physics to reduce overfitting when training on relatively small datasets. A deep neural network is embedded in a partial differential equation (PDE) that expresses the known physics and learns to describe the corresponding unknown or unrepresented physics from the data. Crafted as such, the neural network can also provide corrections for erroneously represented physics, such as discretization errors associated with the PDE’s numerical solution. Once trained, the deep learning PDE model (DPM) can make out-of-sample predictions for new physical parameters, geometries, and boundary conditions. Our approach optimizes over the functional form of the PDE. Estimating the embedded neural network requires optimizing over the entire PDE, which itself is a function of the neural network. Adjoint partial differential equations are used to efficiently calculate the high-dimensional gradient of the objective function with respect to the neural network parameters. A stochastic adjoint method (SAM), similar in spirit to stochastic gradient descent, further accelerates training. The approach is demonstrated and evaluated for turbulence predictions using large-eddy simulation (LES), a filtered version of the Navier–Stokes equation containing unclosed sub-filter-scale terms. The DPM outperforms the widely-used constant-coefficient and dynamic Smagorinsky models, even for filter sizes so large that these established models become qualitatively incorrect. It also significantly outperforms a priori trained models, which do not account for the full PDE. A relaxation of the discrete enforcement of the divergence-free constraint is also considered, instead allowing the DPM to approximately enforce incompressibility physics. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.09145v1 |
https://arxiv.org/pdf/1911.09145v1.pdf | |
PWC | https://paperswithcode.com/paper/dpm-a-deep-learning-pde-augmentation-method |
Repo | |
Framework | |
Cascaded Volumetric Convolutional Network for Kidney Tumor Segmentation from CT volumes
Title | Cascaded Volumetric Convolutional Network for Kidney Tumor Segmentation from CT volumes |
Authors | Yao Zhang, Yixin Wang, Feng Hou, Jiawei Yang, Guangwei Xiong, Jiang Tian, Cheng Zhong |
Abstract | Automated segmentation of kidney and tumor from 3D CT scans is necessary for the diagnosis, monitoring, and treatment planning of the disease. In this paper, we describe a two-stage framework for kidney and tumor segmentation based on 3D fully convolutional network (FCN). The first stage preliminarily locate the kidney and cut off the irrelevant background to reduce class imbalance and computation cost. Then the second stage precisely segment the kidney and tumor on the cropped patch. The proposed method achieves 98.05% and 83.70% of Dice score on the validation set of MICCAI 2019 KiTS Challenge. |
Tasks | |
Published | 2019-10-05 |
URL | https://arxiv.org/abs/1910.02235v1 |
https://arxiv.org/pdf/1910.02235v1.pdf | |
PWC | https://paperswithcode.com/paper/cascaded-volumetric-convolutional-network-for |
Repo | |
Framework | |
Multi-agent Attentional Activity Recognition
Title | Multi-agent Attentional Activity Recognition |
Authors | Kaixuan Chen, Lina Yao, Dalin Zhang, Bin Guo, Zhiwen Yu |
Abstract | Multi-modality is an important feature of sensor based activity recognition. In this work, we consider two inherent characteristics of human activities, the spatially-temporally varying salience of features and the relations between activities and corresponding body part motions. Based on these, we propose a multi-agent spatial-temporal attention model. The spatial-temporal attention mechanism helps intelligently select informative modalities and their active periods. And the multiple agents in the proposed model represent activities with collective motions across body parts by independently selecting modalities associated with single motions. With a joint recognition goal, the agents share gained information and coordinate their selection policies to learn the optimal recognition model. The experimental results on four real-world datasets demonstrate that the proposed model outperforms the state-of-the-art methods. |
Tasks | Activity Recognition |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.08948v1 |
https://arxiv.org/pdf/1905.08948v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-attentional-activity-recognition |
Repo | |
Framework | |