Paper Group ANR 449
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation. Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm. Machine Learning assisted Handover and Resource Management for Cellular Connected Drones. A Random-Feature Based Newton Method for Empirical Risk Minimization in R …
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
Title | Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation |
Authors | Yaqi Duan, Mengdi Wang |
Abstract | This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history generated by unknown behavioral policies. We study a regression-based fitted Q iteration method, and show that it is equivalent to a model-based method that estimates a conditional mean embedding of the transition operator. We prove that this method is information-theoretically optimal and has nearly minimal estimation error. In particular, by leveraging contraction property of Markov processes and martingale concentration, we establish a finite-sample instance-dependent error upper bound and a nearly-matching minimax lower bound. The policy evaluation error depends sharply on a restricted $\chi^2$-divergence over the function class between the long-term distribution of the target policy and the distribution of past data. This restricted $\chi^2$-divergence is both instance-dependent and function-class-dependent. It characterizes the statistical limit of off-policy evaluation. Further, we provide an easily computable confidence bound for the policy evaluator, which may be useful for optimistic planning and safe policy improvement. |
Tasks | |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09516v1 |
https://arxiv.org/pdf/2002.09516v1.pdf | |
PWC | https://paperswithcode.com/paper/minimax-optimal-off-policy-evaluation-with |
Repo | |
Framework | |
Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm
Title | Generation of Consistent Sets of Multi-Label Classification Rules with a Multi-Objective Evolutionary Algorithm |
Authors | Thiago Zafalon Miranda, Diorge Brognara Sardinha, Márcio Porto Basgalupp, Yaochu Jin, Ricardo Cerri |
Abstract | Multi-label classification consists in classifying an instance into two or more classes simultaneously. It is a very challenging task present in many real-world applications, such as classification of biology, image, video, audio, and text. Recently, the interest in interpretable classification models has grown, partially as a consequence of regulations such as the General Data Protection Regulation. In this context, we propose a multi-objective evolutionary algorithm that generates multiple rule-based multi-label classification models, allowing users to choose among models that offer different compromises between predictive power and interpretability. An important contribution of this work is that different from most algorithms, which usually generate models based on lists (ordered collections) of rules, our algorithm generates models based on sets (unordered collections) of rules, increasing interpretability. Also, by employing a conflict avoidance algorithm during the rule-creation, every rule within a given model is guaranteed to be consistent with every other rule in the same model. Thus, no conflict resolution strategy is required, evolving simpler models. We conducted experiments on synthetic and real-world datasets and compared our results with state-of-the-art algorithms in terms of predictive performance (F-Score) and interpretability (model size), and demonstrate that our best models had comparable F-Score and smaller model sizes. |
Tasks | Multi-Label Classification |
Published | 2020-03-27 |
URL | https://arxiv.org/abs/2003.12526v1 |
https://arxiv.org/pdf/2003.12526v1.pdf | |
PWC | https://paperswithcode.com/paper/generation-of-consistent-sets-of-multi-label |
Repo | |
Framework | |
Machine Learning assisted Handover and Resource Management for Cellular Connected Drones
Title | Machine Learning assisted Handover and Resource Management for Cellular Connected Drones |
Authors | Amin Azari, Fayezeh Ghavimi, Mustafa Ozger, Riku Jantti, Cicek Cavdar |
Abstract | Enabling cellular connectivity for drones introduces a wide set of challenges and opportunities. Communication of cellular-connected drones is influenced by 3-dimensional mobility and line-of-sight channel characteristics which results in higher number of handovers with increasing altitude. Our cell planning simulations in coexistence of aerial and terrestrial users indicate that the severe interference from drones to base stations is a major challenge for uplink communications of terrestrial users. Here, we first present the major challenges in co-existence of terrestrial and drone communications by considering real geographical network data for Stockholm. Then, we derive analytical models for the key performance indicators (KPIs), including communications delay and interference over cellular networks, and formulate the handover and radio resource management (H-RRM) optimization problem. Afterwards, we transform this problem into a machine learning problem, and propose a deep reinforcement learning solution to solve H-RRM problem. Finally, using simulation results, we present how the speed and altitude of drones, and the tolerable level of interference, shape the optimal H-RRM policy in the network. Especially, the heat-maps of handover decisions in different drone’s altitudes/speeds have been presented, which promote a revision of the legacy handover schemes and redefining the boundaries of cells in the sky. |
Tasks | |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.07937v1 |
https://arxiv.org/pdf/2001.07937v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-assisted-handover-and |
Repo | |
Framework | |
A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space
Title | A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space |
Authors | Ting-Jui Chang, Shahin Shahrampour |
Abstract | In supervised learning using kernel methods, we encounter a large-scale finite-sum minimization over a reproducing kernel Hilbert space(RKHS). Often times large-scale finite-sum problems can be solved using efficient variants of Newton’s method where the Hessian is approximated via sub-samples. In RKHS, however, the dependence of the penalty function to kernel makes standard sub-sampling approaches inapplicable, since the gram matrix is not readily available in a low-rank form. In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton’s method. Focusing on randomized features for kernel approximation, we provide a novel second-order algorithm that enjoys local superlinear convergence and global convergence in the high probability sense. The key to our analysis is showing that the approximated Hessian via random features preserves the spectrum of the original Hessian. We provide numerical experiments verifying the efficiency of our approach, compared to variants of sub-sampling methods. |
Tasks | |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04753v2 |
https://arxiv.org/pdf/2002.04753v2.pdf | |
PWC | https://paperswithcode.com/paper/a-random-feature-based-newton-method-for |
Repo | |
Framework | |
It Means More if It Sounds Good: Yet Another Hypotheses Concerning the Evolution of Polysemous Words
Title | It Means More if It Sounds Good: Yet Another Hypotheses Concerning the Evolution of Polysemous Words |
Authors | Ivan P. Yamshchikov, Cyrille Merleau Nono Saha, Igor Samenko, Jürgen Jost |
Abstract | This position paper looks into the formation of language and shows ties between structural properties of the words in the English language and their polysemy. Using Ollivier-Ricci curvature over a large graph of synonyms to estimate polysemy it shows empirically that the words that arguably are easier to pronounce also tend to have multiple meanings. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05758v1 |
https://arxiv.org/pdf/2003.05758v1.pdf | |
PWC | https://paperswithcode.com/paper/it-means-more-if-it-sounds-good-yet-another |
Repo | |
Framework | |
Out-of-Distribution Generalization via Risk Extrapolation (REx)
Title | Out-of-Distribution Generalization via Risk Extrapolation (REx) |
Authors | David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Remi Le Priol, Aaron Courville |
Abstract | Generalizing outside of the training distribution is an open challenge for current machine learning systems. A weak form of out-of-distribution (OoD) generalization is the ability to successfully interpolate between multiple observed distributions. One way to achieve this is through robust optimization, which seeks to minimize the worst-case risk over convex combinations of the training distributions. However, a much stronger form of OoD generalization is the ability of models to extrapolate beyond the distributions observed during training. In pursuit of strong OoD generalization, we introduce the principle of Risk Extrapolation (REx). REx can be viewed as encouraging robustness over affine combinations of training risks, by encouraging strict equality between training risks. We show conceptually how this principle enables extrapolation, and demonstrate the effectiveness and scalability of instantiations of REx on various OoD generalization tasks. Our code can be found at https://github.com/capybaralet/REx_code_release. |
Tasks | |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00688v3 |
https://arxiv.org/pdf/2003.00688v3.pdf | |
PWC | https://paperswithcode.com/paper/out-of-distribution-generalization-via-risk |
Repo | |
Framework | |
A new regret analysis for Adam-type algorithms
Title | A new regret analysis for Adam-type algorithms |
Authors | Ahmet Alacaoglu, Yura Malitsky, Panayotis Mertikopoulos, Volkan Cevher |
Abstract | In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter $\beta_{1}$ (typically between $0.9$ and $0.99$). In theory, regret guarantees for online convex optimization require a rapidly decaying $\beta_{1}\to0$ schedule. We show that this is an artifact of the standard analysis and propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $\beta_{1}$, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings. |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09729v1 |
https://arxiv.org/pdf/2003.09729v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-regret-analysis-for-adam-type |
Repo | |
Framework | |
Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters
Title | Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters |
Authors | Ziming Zhang, Wenchi Ma, Yuanwei Wu, Guanghui Wang |
Abstract | In this paper, we investigate the empirical impact of orthogonality regularization (OR) in deep learning, either solo or collaboratively. Recent works on OR showed some promising results on the accuracy. In our ablation study, however, we do not observe such significant improvement from existing OR techniques compared with the conventional training based on weight decay, dropout, and batch normalization. To identify the real gain from OR, inspired by the locality sensitive hashing (LSH) in angle estimation, we propose to introduce an implicit self-regularization into OR to push the mean and variance of filter angles in a network towards 90 and 0 simultaneously to achieve (near) orthogonality among the filters, without using any other explicit regularization. Our regularization can be implemented as an architectural plug-in and integrated with an arbitrary network. We reveal that OR helps stabilize the training process and leads to faster convergence and better generalization. |
Tasks | |
Published | 2020-01-05 |
URL | https://arxiv.org/abs/2001.01275v2 |
https://arxiv.org/pdf/2001.01275v2.pdf | |
PWC | https://paperswithcode.com/paper/self-orthogonality-module-a-network |
Repo | |
Framework | |
Obstruction level detection of sewer videos using convolutional neural networks
Title | Obstruction level detection of sewer videos using convolutional neural networks |
Authors | Mario A. Gutierrez-Mondragon, Dario Garcia-Gasulla, Sergio Alvarez-Napagao, Jaume Brossa-Ordoñez, Rafael Gimenez-Esteban |
Abstract | Worldwide, sewer networks are designed to transport wastewater to a centralized treatment plant to be treated and returned to the environment. This process is critical for the current society, preventing waterborne illnesses, providing safe drinking water and enhancing general sanitation. To keep a sewer network perfectly operational, sampling inspections are performed constantly to identify obstructions. Typically, a Closed-Circuit Television system is used to record the inside of pipes and report the obstruction level, which may trigger a cleaning operative. Currently, the obstruction level assessment is done manually, which is time-consuming and inconsistent. In this work, we design a methodology to train a Convolutional Neural Network for identifying the level of obstruction in pipes, thus reducing the human effort required on such a frequent and repetitive task. We gathered a database of videos that are explored and adapted to generate useful frames to fed into the model. Our resulting classifier obtains deployment ready performances. To validate the consistency of the approach and its industrial applicability, we integrate the Layer-wise Relevance Propagation explainability technique, which enables us to further understand the behavior of the neural network for this task. In the end, the proposed system can provide higher speed, accuracy, and consistency in the process of sewer examination. Our analysis also uncovers some guidelines on how to further improve the quality of the data gathering methodology. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01284v1 |
https://arxiv.org/pdf/2002.01284v1.pdf | |
PWC | https://paperswithcode.com/paper/obstruction-level-detection-of-sewer-videos |
Repo | |
Framework | |
Multi-label natural language processing to identify diagnosis and procedure codes from MIMIC-III inpatient notes
Title | Multi-label natural language processing to identify diagnosis and procedure codes from MIMIC-III inpatient notes |
Authors | A. K. Bhavani Singh, Mounika Guntu, Ananth Reddy Bhimireddy, Judy W. Gichoya, Saptarshi Purkayastha |
Abstract | In the United States, 25% or greater than 200 billion dollars of hospital spending accounts for administrative costs that involve services for medical coding and billing. With the increasing number of patient records, manual assignment of the codes performed is overwhelming, time-consuming and error-prone, causing billing errors. Natural language processing can automate the extraction of codes/labels from unstructured clinical notes, which can aid human coders to save time, increase productivity, and verify medical coding errors. Our objective is to identify appropriate diagnosis and procedure codes from clinical notes by performing multi-label classification. We used de-identified data of critical care patients from the MIMIC-III database and subset the data to select the ten (top-10) and fifty (top-50) most common diagnoses and procedures, which covers 47.45% and 74.12% of all admissions respectively. We implemented state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) to fine-tune the language model on 80% of the data and validated on the remaining 20%. The model achieved an overall accuracy of 87.08%, an F1 score of 85.82%, and an AUC of 91.76% for top-10 codes. For the top-50 codes, our model achieved an overall accuracy of 93.76%, an F1 score of 92.24%, and AUC of 91%. When compared to previously published research, our model outperforms in predicting codes from the clinical text. We discuss approaches to generalize the knowledge discovery process of our MIMIC-BERT to other clinical notes. This can help human coders to save time, prevent backlogs, and additional costs due to coding errors. |
Tasks | Language Modelling, Multi-Label Classification |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07507v1 |
https://arxiv.org/pdf/2003.07507v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-label-natural-language-processing-to |
Repo | |
Framework | |
Minimal spiking neuron for solving multi-label classification tasks
Title | Minimal spiking neuron for solving multi-label classification tasks |
Authors | Jakub Fil, Dominique Chu |
Abstract | The Multi-Spike Tempotron (MST) is a powerful single spiking neuron model that can solve complex supervised classification tasks. While powerful, it is also internally complex, computationally expensive to evaluate, and not suitable for neuromorphic hardware. Here we aim to understand whether it is possible to simplify the MST model, while retaining its ability to learn and to process information. To this end, we introduce a family of Generalised Neuron Models (GNM) which are a special case of the Spike Response Model and much simpler and cheaper to simulate than the MST. We find that over a wide range of parameters the GNM can learn at least as well as the MST. We identify the temporal autocorrelation of the membrane potential as the single most important ingredient of the GNM which enables it to classify multiple spatio-temporal patterns. We also interpret the GNM as a chemical system, thus conceptually bridging computation by neural networks with molecular information processing. We conclude the paper by proposing alternative training approaches for the GNM including error trace learning and error backpropagation. |
Tasks | Multi-Label Classification |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02902v1 |
https://arxiv.org/pdf/2003.02902v1.pdf | |
PWC | https://paperswithcode.com/paper/minimal-spiking-neuron-for-solving-multi |
Repo | |
Framework | |
Understanding patient complaint characteristics using contextual clinical BERT embeddings
Title | Understanding patient complaint characteristics using contextual clinical BERT embeddings |
Authors | Budhaditya Saha, Sanal Lisboa, Shameek Ghosh |
Abstract | In clinical conversational applications, extracted entities tend to capture the main subject of a patient’s complaint, namely symptoms or diseases. However, they mostly fail to recognize the characterizations of a complaint such as the time, the onset, and the severity. For example, if the input is “I have a headache and it is extreme”, state-of-the-art models only recognize the main symptom entity - headache, but ignore the severity factor of “extreme”, that characterizes headache. In this paper, we design a two-stage approach to detect the characterizations of entities like symptoms presented by general users in contexts where they would describe their symptoms to a clinician. We use Word2Vec and BERT to encode clinical text given by the patients. We transform the output and re-frame the task as multi-label classification problem. Finally, we combine the processed encodings with the Linear Discriminant Analysis (LDA) algorithm to classify the characterizations of the main entity. Experimental results demonstrate that our method achieves 40-50% improvement on the accuracy over the state-of-the-art models. |
Tasks | Multi-Label Classification |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.05902v1 |
https://arxiv.org/pdf/2002.05902v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-patient-complaint |
Repo | |
Framework | |
What’s the relationship between CNNs and communication systems?
Title | What’s the relationship between CNNs and communication systems? |
Authors | Hao Ge, Xiaoguang Tu, Yanxiang Gong, Mei Xie, Zheng Ma |
Abstract | The interpretability of Convolutional Neural Networks (CNNs) is an important topic in the field of computer vision. In recent years, works in this field generally adopt a mature model to reveal the internal mechanism of CNNs, helping to understand CNNs thoroughly. In this paper, we argue the working mechanism of CNNs can be revealed through a totally different interpretation, by comparing the communication systems and CNNs. This paper successfully obtained the corresponding relationship between the modules of the two, and verified the rationality of the corresponding relationship with experiments. Finally, through the analysis of some cutting-edge research on neural networks, we find the inherent relation between these two tasks can be of help in explaining these researches reasonably, as well as helping us discover the correct research direction of neural networks. |
Tasks | |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01413v1 |
https://arxiv.org/pdf/2003.01413v1.pdf | |
PWC | https://paperswithcode.com/paper/whats-the-relationship-between-cnns-and |
Repo | |
Framework | |
Why is the Mahalanobis Distance Effective for Anomaly Detection?
Title | Why is the Mahalanobis Distance Effective for Anomaly Detection? |
Authors | Ryo Kamoi, Kei Kobayashi |
Abstract | The Mahalanobis distance-based confidence score, a recently proposed anomaly detection method for pre-trained neural classifiers, achieves state-of-the-art performance on both out-of-distribution and adversarial example detection. This work analyzes why this method exhibits such strong performance while imposing an implausible assumption; namely, that class conditional distributions of intermediate features have tied covariance. We reveal that the reason for its effectiveness has been misunderstood. Although this method scores the prediction confidence for the original classification task, our analysis suggests that information critical for classification task does not contribute to state-of-the-art performance on anomaly detection. To support this hypothesis, we demonstrate that a simpler confidence score that does not use class information is as effective as the original method in most cases. Moreover, our experiments show that the confidence scores can exhibit different behavior on other frameworks such as metric learning models, and their detection performance is sensitive to model architecture choice. These findings provide insight into the behavior of neural classifiers when provided with anomalous inputs. |
Tasks | Anomaly Detection, Metric Learning |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00402v1 |
https://arxiv.org/pdf/2003.00402v1.pdf | |
PWC | https://paperswithcode.com/paper/why-is-the-mahalanobis-distance-effective-for |
Repo | |
Framework | |
DROCC: Deep Robust One-Class Classification
Title | DROCC: Deep Robust One-Class Classification |
Authors | Sachin Goyal, Aditi Raghunathan, Moksh Jain, Harsha Vardhan Simhadri, Prateek Jain |
Abstract | Classical approaches for one-class problems such as one-class SVM (Scholkopf et al., 1999) and isolation forest (Liu et al., 2008) require careful feature engineering when applied to structured domains like images. To alleviate this concern, state-of-the-art methods like DeepSVDD (Ruff et al., 2018) consider the natural alternative of minimizing a classical one-class loss applied to the learned final layer representations. However, such an approach suffers from the fundamental drawback that a representation that simply collapses all the inputs minimizes the one class loss; heuristics to mitigate collapsed representations provide limited benefits. In this work, we propose Deep Robust One Class Classification (DROCC) method that is robust to such a collapse by training the network to distinguish the training points from their perturbations, generated adversarially. DROCC is motivated by the assumption that the interesting class lies on a locally linear low dimensional manifold. Empirical evaluation demonstrates DROCC’s effectiveness on two different one-class problem settings and on a range of real-world datasets across different domains - images(CIFAR and ImageNet), audio and timeseries, offering up to 20% increase in accuracy over the state-of-the-art in anomaly detection. |
Tasks | Anomaly Detection, Feature Engineering |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12718v1 |
https://arxiv.org/pdf/2002.12718v1.pdf | |
PWC | https://paperswithcode.com/paper/drocc-deep-robust-one-class-classification |
Repo | |
Framework | |