Paper Group ANR 61
Invariant Risk Minimization Games. A Continuous Space Neural Language Model for Bengali Language. dtControl: Decision Tree Learning Algorithms for Controller Representation. Scalable Quantitative Verification For Deep Neural Networks. Multilingual Stance Detection: The Catalonia Independence Corpus. IMAC: In-memory multi-bit Multiplication andACcum …
Invariant Risk Minimization Games
Title | Invariant Risk Minimization Games |
Authors | Kartik Ahuja, Karthikeyan Shanmugam, Kush R. Varshney, Amit Dhurandhar |
Abstract | The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding invariant predictors reduces the effect of spurious features by concentrating models on features that have a causal relationship with the outcome. In this work, we pose such invariant risk minimization as finding the Nash equilibrium of an ensemble game among several environments. By doing so, we develop a simple training algorithm that uses best response dynamics and, in our experiments, yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019). One key theoretical contribution is showing that the set of Nash equilibria for the proposed game are equivalent to the set of invariant predictors for any finite number of environments, even with nonlinear classifiers and transformations. As a result, our method also retains the generalization guarantees to a large set of environments shown in Arjovsky et al. (2019). The proposed algorithm adds to the collection of successful game-theoretic machine learning algorithms such as generative adversarial networks. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04692v2 |
https://arxiv.org/pdf/2002.04692v2.pdf | |
PWC | https://paperswithcode.com/paper/invariant-risk-minimization-games |
Repo | |
Framework | |
A Continuous Space Neural Language Model for Bengali Language
Title | A Continuous Space Neural Language Model for Bengali Language |
Authors | Hemayet Ahmed Chowdhury, Md. Azizul Haque Imon, Anisur Rahman, Aisha Khatun, Md. Saiful Islam |
Abstract | Language models are generally employed to estimate the probability distribution of various linguistic units, making them one of the fundamental parts of natural language processing. Applications of language models include a wide spectrum of tasks such as text summarization, translation and classification. For a low resource language like Bengali, the research in this area so far can be considered to be narrow at the very least, with some traditional count based models being proposed. This paper attempts to address the issue and proposes a continuous-space neural language model, or more specifically an ASGD weight dropped LSTM language model, along with techniques to efficiently train it for Bengali Language. The performance analysis with some currently existing count based models illustrated in this paper also shows that the proposed architecture outperforms its counterparts by achieving an inference perplexity as low as 51.2 on the held out data set for Bengali. |
Tasks | Language Modelling, Text Summarization |
Published | 2020-01-11 |
URL | https://arxiv.org/abs/2001.05315v1 |
https://arxiv.org/pdf/2001.05315v1.pdf | |
PWC | https://paperswithcode.com/paper/a-continuous-space-neural-language-model-for |
Repo | |
Framework | |
dtControl: Decision Tree Learning Algorithms for Controller Representation
Title | dtControl: Decision Tree Learning Algorithms for Controller Representation |
Authors | Pranav Ashok, Mathias Jackermeier, Pushpak Jagtap, Jan Křetínský, Maximilian Weininger, Majid Zamani |
Abstract | Decision tree learning is a popular classification technique most commonly used in machine learning applications. Recent work has shown that decision trees can be used to represent provably-correct controllers concisely. Compared to representations using lookup tables or binary decision diagrams, decision trees are smaller and more explainable. We present dtControl, an easily extensible tool for representing memoryless controllers as decision trees. We give a comprehensive evaluation of various decision tree learning algorithms applied to 10 case studies arising out of correct-by-construction controller synthesis. These algorithms include two new techniques, one for using arbitrary linear binary classifiers in the decision tree learning, and one novel approach for determinizing controllers during the decision tree construction. In particular the latter turns out to be extremely efficient, yielding decision trees with a single-digit number of decision nodes on 5 of the case studies. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04991v1 |
https://arxiv.org/pdf/2002.04991v1.pdf | |
PWC | https://paperswithcode.com/paper/dtcontrol-decision-tree-learning-algorithms |
Repo | |
Framework | |
Scalable Quantitative Verification For Deep Neural Networks
Title | Scalable Quantitative Verification For Deep Neural Networks |
Authors | Teodora Baluta, Zheng Leong Chua, Kuldeep S. Meel, Prateek Saxena |
Abstract | Verifying security properties of deep neural networks (DNNs) is becoming increasingly important. This paper introduces a new quantitative verification framework for DNNs that can decide, with user-specified confidence, whether a given logical property {\psi} defined over the space of inputs of the given DNN holds for less than a user-specified threshold, {\theta}. We present new algorithms that are scalable to large real-world models as well as proven to be sound. Our approach requires only black-box access to the models. Further, it certifies properties of both deterministic and non-deterministic DNNs. We implement our approach in a tool called PROVERO. We apply PROVERO to the problem of certifying adversarial robustness. In this context, PROVERO provides an attack-agnostic measure of robustness for a given DNN and a test input. First, we find that this metric has a strong statistical correlation with perturbation bounds reported by 2 of the most prominent white-box attack strategies today. Second, we show that PROVERO can quantitatively certify robustness with high confidence in cases where the state-of-the-art qualitative verification tool (ERAN) fails to produce conclusive results. Thus, quantitative verification scales easily to large DNNs. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06864v1 |
https://arxiv.org/pdf/2002.06864v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-quantitative-verification-for-deep |
Repo | |
Framework | |
Multilingual Stance Detection: The Catalonia Independence Corpus
Title | Multilingual Stance Detection: The Catalonia Independence Corpus |
Authors | Elena Zotova, Rodrigo Agerri, Manuel Nuñez, German Rigau |
Abstract | Stance detection aims to determine the attitude of a given text with respect to a specific topic or claim. While stance detection has been fairly well researched in the last years, most the work has been focused on English. This is mainly due to the relative lack of annotated data in other languages. The TW-10 Referendum Dataset released at IberEval 2018 is a previous effort to provide multilingual stance-annotated data in Catalan and Spanish. Unfortunately, the TW-10 Catalan subset is extremely imbalanced. This paper addresses these issues by presenting a new multilingual dataset for stance detection in Twitter for the Catalan and Spanish languages, with the aim of facilitating research on stance detection in multilingual and cross-lingual settings. The dataset is annotated with stance towards one topic, namely, the independence of Catalonia. We also provide a semi-automatic method to annotate the dataset based on a categorization of Twitter users. We experiment on the new corpus with a number of supervised approaches, including linear classifiers and deep learning methods. Comparison of our new corpus with the with the TW-1O dataset shows both the benefits and potential of a well balanced corpus for multilingual and cross-lingual research on stance detection. Finally, we establish new state-of-the-art results on the TW-10 dataset, both for Catalan and Spanish. |
Tasks | Stance Detection |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2004.00050v1 |
https://arxiv.org/pdf/2004.00050v1.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-stance-detection-the-catalonia |
Repo | |
Framework | |
IMAC: In-memory multi-bit Multiplication andACcumulation in 6T SRAM Array
Title | IMAC: In-memory multi-bit Multiplication andACcumulation in 6T SRAM Array |
Authors | Mustafa Ali, Akhilesh Jaiswal, Sangamesh Kodge, Amogh Agrawal, Indranil Chakraborty, Kaushik Roy |
Abstract | `In-memory computing’ is being widely explored as a novel computing paradigm to mitigate the well known memory bottleneck. This emerging paradigm aims at embedding some aspects of computations inside the memory array, thereby avoiding frequent and expensive movement of data between the compute unit and the storage memory. In-memory computing with respect to Silicon memories has been widely explored on various memory bit-cells. Embedding computation inside the 6 transistor (6T) SRAM array is of special interest since it is the most widely used on-chip memory. In this paper, we present a novel in-memory multiplication followed by accumulation operation capable of performing parallel dot products within 6T SRAM without any changes to the standard bitcell. We, further, study the effect of circuit non-idealities and process variations on the accuracy of the LeNet-5 and VGG neural network architectures against the MNIST and CIFAR-10 datasets, respectively. The proposed in-memory dot-product mechanism achieves 88.8% and 99% accuracy for the CIFAR-10 and MNIST, respectively. Compared to the standard von Neumann system, the proposed system is 6.24x better in energy consumption and 9.42x better in delay. | |
Tasks | |
Published | 2020-03-27 |
URL | https://arxiv.org/abs/2003.12558v1 |
https://arxiv.org/pdf/2003.12558v1.pdf | |
PWC | https://paperswithcode.com/paper/imac-in-memory-multi-bit-multiplication |
Repo | |
Framework | |
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
Title | Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions |
Authors | Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez |
Abstract | Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust. |
Tasks | |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03478v2 |
https://arxiv.org/pdf/2002.03478v2.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-off-policy-evaluation-in |
Repo | |
Framework | |
Radiologist-level stroke classification on non-contrast CT scans with Deep U-Net
Title | Radiologist-level stroke classification on non-contrast CT scans with Deep U-Net |
Authors | Manvel Avetisian, Vladimir Kokh, Alex Tuzhilin, Dmitry Umerenkov |
Abstract | Segmentation of ischemic stroke and intracranial hemorrhage on computed tomography is essential for investigation and treatment of stroke. In this paper, we modified the U-Net CNN architecture for the stroke identification problem using non-contrast CT. We applied the proposed DL model to historical patient data and also conducted clinical experiments involving ten experienced radiologists. Our model achieved strong results on historical data, and significantly outperformed seven radiologist out of ten, while being on par with the remaining three. |
Tasks | Stroke Classification |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.14287v1 |
https://arxiv.org/pdf/2003.14287v1.pdf | |
PWC | https://paperswithcode.com/paper/radiologist-level-stroke-classification-on |
Repo | |
Framework | |
Minor Constraint Disturbances for Deep Semi-supervised Learning
Title | Minor Constraint Disturbances for Deep Semi-supervised Learning |
Authors | Jielei Chu, Jing Liu, Hongjun Wang, Zhiguo Gong, Tianrui Li |
Abstract | In high-dimensional data space, semi-supervised feature learning based on Euclidean distance shows instability under a broad set of conditions. Furthermore, the scarcity and high cost of labels prompt us to explore new semi-supervised learning methods with the fewest labels. In this paper, we develop a novel Minor Constraint Disturbances-based Deep Semi-supervised Feature Learning framework (MCD-DSFL) from the perspective of probability distribution for feature representation. There are two fundamental modules in the proposed framework: one is a Minor Constraint Disturbances-based restricted Boltzmann machine with Gaussian visible units (MCDGRBM) for modelling continuous data and the other is a Minor Constraint Disturbances-based restricted Boltzmann machine (MCDRBM) for modelling binary data. The Minor Constraint Disturbances (MCD) consist of less instance-level constraints which are produced by only two randomly selected labels from each class. The Kullback-Leibler (KL) divergences of the MCD are fused into the Contrastive Divergence (CD) learning for training the proposed MCDGRBM and MCDRBM models. Then, the probability distributions of hidden layer features are as similar as possible in the same class and they are as dissimilar as possible in the different classes simultaneously. Despite the weak influence of the MCD for our shallow models (MCDGRBM and MCDRBM), the proposed deep MCD-DSFL framework improves the representation capability significantly under its leverage effect. The semi-supervised strategy based on the KL divergence of the MCD significantly reduces the reliance on the labels and improves the stability of the semi-supervised feature learning in high-dimensional space simultaneously. |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06321v1 |
https://arxiv.org/pdf/2003.06321v1.pdf | |
PWC | https://paperswithcode.com/paper/minor-constraint-disturbances-for-deep-semi |
Repo | |
Framework | |
Decomposable Probability-of-Success Metrics in Algorithmic Search
Title | Decomposable Probability-of-Success Metrics in Algorithmic Search |
Authors | Tyler Sam, Jake Williams, Abel Tadesse, Huey Sun, George Montanez |
Abstract | Previous studies have used a specific success metric within an algorithmic search framework to prove machine learning impossibility results. However, this specific success metric prevents us from applying these results on other forms of machine learning, e.g. transfer learning. We define decomposable metrics as a category of success metrics for search problems which can be expressed as a linear operation on a probability distribution to solve this issue. Using an arbitrary decomposable metric to measure the success of a search, we demonstrate theorems which bound success in various ways, generalizing several existing results in the literature. |
Tasks | Transfer Learning |
Published | 2020-01-03 |
URL | https://arxiv.org/abs/2001.00742v1 |
https://arxiv.org/pdf/2001.00742v1.pdf | |
PWC | https://paperswithcode.com/paper/decomposable-probability-of-success-metrics |
Repo | |
Framework | |
Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps
Title | Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps |
Authors | Zhichuang Sun, Ruimin Sun, Long Lu |
Abstract | On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? How much can (stolen) models cost? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices. |
Tasks | Face Recognition, Malware Detection |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07687v1 |
https://arxiv.org/pdf/2002.07687v1.pdf | |
PWC | https://paperswithcode.com/paper/mind-your-weights-a-large-scale-study-on |
Repo | |
Framework | |
Event Detection with Relation-Aware Graph Convolutional Neural Networks
Title | Event Detection with Relation-Aware Graph Convolutional Neural Networks |
Authors | Shiyao Cui, Bowen Yu, Tingwen Liu, Zhenyu Zhang, Xuebin Wang, Jinqiao Shi |
Abstract | Event detection (ED), a key subtask of information extraction, aims to recognize instances of specific types of events in text. Recently, graph convolutional networks (GCNs) over dependency trees have been widely used to capture syntactic structure information and get convincing performances in event detection. However, these works ignore the syntactic relation labels on the tree, which convey rich and useful linguistic knowledge for event detection. In this paper, we investigate a novel architecture named Relation-Aware GCN (RA-GCN), which efficiently exploits syntactic relation labels and models the relation between words specifically. We first propose a relation-aware aggregation module to produce expressive word representation by aggregating syntactically connected words through specific relation. Furthermore, a context-aware relation update module is designed to explicitly update the relation representation between words, and these two modules work in the mutual promotion way. Experimental results on the ACE2005 dataset show that our model achieves a new state-of-the-art performance for event detection. |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10757v1 |
https://arxiv.org/pdf/2002.10757v1.pdf | |
PWC | https://paperswithcode.com/paper/event-detection-with-relation-aware-graph |
Repo | |
Framework | |
Train, Learn, Expand, Repeat
Title | Train, Learn, Expand, Repeat |
Authors | Abhijeet Parida, Aadhithya Sankar, Rami Eisawy, Tom Finck, Benedikt Wiestler, Franz Pfister, Julia Moosbauer |
Abstract | High-quality labeled data is essential to successfully train supervised machine learning models. Although a large amount of unlabeled data is present in the medical domain, labeling poses a major challenge: medical professionals who can expertly label the data are a scarce and expensive resource. Making matters worse, voxel-wise delineation of data (e.g. for segmentation tasks) is tedious and suffers from high inter-rater variance, thus dramatically limiting available training data. We propose a recursive training strategy to perform the task of semantic segmentation given only very few training samples with pixel-level annotations. We expand on this small training set having cheaper image-level annotations using a recursive training strategy. We apply this technique on the segmentation of intracranial hemorrhage (ICH) in CT (computed tomography) scans of the brain, where typically few annotated data is available. |
Tasks | Semantic Segmentation |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08469v1 |
https://arxiv.org/pdf/2003.08469v1.pdf | |
PWC | https://paperswithcode.com/paper/train-learn-expand-repeat |
Repo | |
Framework | |
Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models
Title | Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models |
Authors | Chin-Wei Huang, Laurent Dinh, Aaron Courville |
Abstract | In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling. |
Tasks | Latent Variable Models |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07101v1 |
https://arxiv.org/pdf/2002.07101v1.pdf | |
PWC | https://paperswithcode.com/paper/augmented-normalizing-flows-bridging-the-gap |
Repo | |
Framework | |
Efficient Machine Learning Approach for Optimizing the Timing Resolution of a High Purity Germanium Detector
Title | Efficient Machine Learning Approach for Optimizing the Timing Resolution of a High Purity Germanium Detector |
Authors | R. W. Gladen, V. A. Chirayath, A. J. Fairchild, M. T. Manry, A. R. Koymen, A. H. Weiss |
Abstract | We describe here an efficient machine-learning based approach for the optimization of parameters used for extracting the arrival time of waveforms, in particular those generated by the detection of 511 keV annihilation gamma-rays by a 60 cm3 coaxial high purity germanium detector (HPGe). The method utilizes a type of artificial neural network (ANN) called a self-organizing map (SOM) to cluster the HPGe waveforms based on the shape of their rising edges. The optimal timing parameters for HPGe waveforms belonging to a particular cluster are found by minimizing the time difference between the HPGe signal and a signal produced by a BaF2 scintillation detector. Applying these variable timing parameters to the HPGe signals achieved a gamma-coincidence timing resolution of ~ 4.3 ns at the 511 keV photo peak (defined as 511 +- 50 keV) and a timing resolution of ~ 6.5 ns for the entire gamma spectrum–without rejecting any valid pulses. This timing resolution approaches the best obtained by analog nuclear electronics, without the corresponding complexities of analog optimization procedures. We further demonstrate the universality and efficacy of the machine learning approach by applying the method to the generation of secondary electron time-of-flight spectra following the implantation of energetic positrons on a sample. |
Tasks | |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2004.00008v1 |
https://arxiv.org/pdf/2004.00008v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-machine-learning-approach-for |
Repo | |
Framework | |