Paper Group AWR 57
MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records. A Partially Reversible U-Net for Memory-Efficient Volumetric Image Segmentation. MMDetection: Open MMLab Detection Toolbox and Benchmark. Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments. DeepGCNs: Can GCNs Go …
MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records
Title | MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records |
Authors | Xi Sheryl Zhang, Fengyi Tang, Hiroko Dodge, Jiayu Zhou, Fei Wang |
Abstract | In recent years, increasingly augmentation of health data, such as patient Electronic Health Records (EHR), are becoming readily available. This provides an unprecedented opportunity for knowledge discovery and data mining algorithms to dig insights from them, which can, later on, be helpful to the improvement of the quality of care delivery. Predictive modeling of clinical risk, including in-hospital mortality, hospital readmission, chronic disease onset, condition exacerbation, etc., from patient EHR, is one of the health data analytic problems that attract most of the interests. The reason is not only because the problem is important in clinical settings, but also there are challenges working with EHR such as sparsity, irregularity, temporality, etc. Different from applications in other domains such as computer vision and natural language processing, the labeled data samples in medicine (patients) are relatively limited, which creates lots of troubles for effective predictive model learning, especially for complicated models such as deep learning. In this paper, we propose MetaPred, a meta-learning for clinical risk prediction from longitudinal patient EHRs. In particular, in order to predict the target risk where there are limited data samples, we train a meta-learner from a set of related risk prediction tasks which learns how a good predictor is learned. The meta-learned can then be directly used in target risk prediction, and the limited available samples can be used for further fine-tuning the model performance. The effectiveness of MetaPred is tested on a real patient EHR repository from Oregon Health & Science University. We are able to demonstrate that with CNN and RNN as base predictors, MetaPred can achieve much better performance for predicting target risk with low resources comparing with the predictor trained on the limited samples available for this risk. |
Tasks | Meta-Learning |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03218v1 |
https://arxiv.org/pdf/1905.03218v1.pdf | |
PWC | https://paperswithcode.com/paper/metapred-meta-learning-for-clinical-risk |
Repo | https://github.com/sheryl-ai/MetaPred |
Framework | tf |
A Partially Reversible U-Net for Memory-Efficient Volumetric Image Segmentation
Title | A Partially Reversible U-Net for Memory-Efficient Volumetric Image Segmentation |
Authors | Robin Brügger, Christian F. Baumgartner, Ender Konukoglu |
Abstract | One of the key drawbacks of 3D convolutional neural networks for segmentation is their memory footprint, which necessitates compromises in the network architecture in order to fit into a given memory budget. Motivated by the RevNet for image classification, we propose a partially reversible U-Net architecture that reduces memory consumption substantially. The reversible architecture allows us to exactly recover each layer’s outputs from the subsequent layer’s ones, eliminating the need to store activations for backpropagation. This alleviates the biggest memory bottleneck and enables very deep (theoretically infinitely deep) 3D architectures. On the BraTS challenge dataset, we demonstrate substantial memory savings. We further show that the freed memory can be used for processing the whole field-of-view (FOV) instead of patches. Increasing network depth led to higher segmentation accuracy while growing the memory footprint only by a very small fraction, thanks to the partially reversible architecture. |
Tasks | Image Classification, Medical Image Segmentation, Semantic Segmentation |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06148v2 |
https://arxiv.org/pdf/1906.06148v2.pdf | |
PWC | https://paperswithcode.com/paper/a-partially-reversible-u-net-for-memory |
Repo | https://github.com/RobinBruegger/RevTorch |
Framework | pytorch |
MMDetection: Open MMLab Detection Toolbox and Benchmark
Title | MMDetection: Open MMLab Detection Toolbox and Benchmark |
Authors | Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin |
Abstract | We present MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. The toolbox started from a codebase of MMDet team who won the detection track of COCO Challenge 2018. It gradually evolves into a unified platform that covers many popular detection methods and contemporary modules. It not only includes training and inference codes, but also provides weights for more than 200 network models. We believe this toolbox is by far the most complete detection toolbox. In this paper, we introduce the various features of this toolbox. In addition, we also conduct a benchmarking study on different methods, components, and their hyper-parameters. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. Code and models are available at https://github.com/open-mmlab/mmdetection. The project is under active development and we will keep this document updated. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.07155v1 |
https://arxiv.org/pdf/1906.07155v1.pdf | |
PWC | https://paperswithcode.com/paper/mmdetection-open-mmlab-detection-toolbox-and |
Repo | https://github.com/UsmannK/cloud-mmdetection |
Framework | pytorch |
Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments
Title | Supervise Thyself: Examining Self-Supervised Representations in Interactive Environments |
Authors | Evan Racah, Christopher Pal |
Abstract | Self-supervised methods, wherein an agent learns representations solely by observing the results of its actions, become crucial in environments which do not provide a dense reward signal or have labels. In most cases, such methods are used for pretraining or auxiliary tasks for “downstream” tasks, such as control, exploration, or imitation learning. However, it is not clear which method’s representations best capture meaningful features of the environment, and which are best suited for which types of environments. We present a small-scale study of self-supervised methods on two visual environments: Flappy Bird and Sonic The Hedgehog. In particular, we quantitatively evaluate the representations learned from these tasks in two contexts: a) the extent to which the representations capture true state information of the agent and b) how generalizable these representations are to novel situations, like new levels and textures. Lastly, we evaluate these self-supervised features by visualizing which parts of the environment they focus on. Our results show that the utility of the representations is highly dependent on the visuals and dynamics of the environment. |
Tasks | Imitation Learning |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11951v1 |
https://arxiv.org/pdf/1906.11951v1.pdf | |
PWC | https://paperswithcode.com/paper/supervise-thyself-examining-self-supervised |
Repo | https://github.com/eracah/supervise-thyself |
Framework | pytorch |
DeepGCNs: Can GCNs Go as Deep as CNNs?
Title | DeepGCNs: Can GCNs Go as Deep as CNNs? |
Authors | Guohao Li, Matthias Müller, Ali Thabet, Bernard Ghanem |
Abstract | Convolutional Neural Networks (CNNs) achieve impressive performance in a wide variety of fields. Their success benefited from a massive boost when very deep CNN models were able to be reliably trained. Despite their merits, CNNs fail to properly address problems with non-Euclidean data. To overcome this challenge, Graph Convolutional Networks (GCNs) build graphs to represent non-Euclidean data, borrow concepts from CNNs, and apply them in training. GCNs show promising results, but they are usually limited to very shallow models due to the vanishing gradient problem. As a result, most state-of-the-art GCN models are no deeper than 3 or 4 layers. In this work, we present new ways to successfully train very deep GCNs. We do this by borrowing concepts from CNNs, specifically residual/dense connections and dilated convolutions, and adapting them to GCN architectures. Extensive experiments show the positive effect of these deep GCN frameworks. Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3.7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation. We believe that the community can greatly benefit from this work, as it opens up many opportunities for advancing GCN-based research. |
Tasks | 3D Semantic Segmentation, Semantic Segmentation |
Published | 2019-04-07 |
URL | https://arxiv.org/abs/1904.03751v2 |
https://arxiv.org/pdf/1904.03751v2.pdf | |
PWC | https://paperswithcode.com/paper/can-gcns-go-as-deep-as-cnns |
Repo | https://github.com/lightaime/deep_gcns |
Framework | tf |
Causal Confusion in Imitation Learning
Title | Causal Confusion in Imitation Learning |
Authors | Pim de Haan, Dinesh Jayaraman, Sergey Levine |
Abstract | Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive “causal misidentification” phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations. |
Tasks | Imitation Learning |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11979v2 |
https://arxiv.org/pdf/1905.11979v2.pdf | |
PWC | https://paperswithcode.com/paper/causal-confusion-in-imitation-learning |
Repo | https://github.com/pimdh/causal-confusion |
Framework | pytorch |
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
Title | N-BEATS: Neural basis expansion analysis for interpretable time series forecasting |
Authors | Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio |
Abstract | We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year’s winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy. |
Tasks | Time Series, Time Series Forecasting |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10437v4 |
https://arxiv.org/pdf/1905.10437v4.pdf | |
PWC | https://paperswithcode.com/paper/n-beats-neural-basis-expansion-analysis-for |
Repo | https://github.com/amitesh863/nbeats_forecast |
Framework | pytorch |
Joint Ranking SVM and Binary Relevance with Robust Low-Rank Learning for Multi-Label Classification
Title | Joint Ranking SVM and Binary Relevance with Robust Low-Rank Learning for Multi-Label Classification |
Authors | Guoqiang Wu, Ruobing Zheng, Yingjie Tian, Dalian Liu |
Abstract | Multi-label classification studies the task where each example belongs to multiple labels simultaneously. As a representative method, Ranking Support Vector Machine (Rank-SVM) aims to minimize the Ranking Loss and can also mitigate the negative influence of the class-imbalance issue. However, due to its stacking-style way for thresholding, it may suffer error accumulation and thus reduces the final classification performance. Binary Relevance (BR) is another typical method, which aims to minimize the Hamming Loss and only needs one-step learning. Nevertheless, it might have the class-imbalance issue and does not take into account label correlations. To address the above issues, we propose a novel multi-label classification model, which joints Ranking support vector machine and Binary Relevance with robust Low-rank learning (RBRL). RBRL inherits the ranking loss minimization advantages of Rank-SVM, and thus overcomes the disadvantages of BR suffering the class-imbalance issue and ignoring the label correlations. Meanwhile, it utilizes the hamming loss minimization and one-step learning advantages of BR, and thus tackles the disadvantages of Rank-SVM including another thresholding learning step. Besides, a low-rank constraint is utilized to further exploit high-order label correlations under the assumption of low dimensional label space. Furthermore, to achieve nonlinear multi-label classifiers, we derive the kernelization RBRL. Two accelerated proximal gradient methods (APG) are used to solve the optimization problems efficiently. Extensive comparative experiments with several state-of-the-art methods illustrate a highly competitive or superior performance of our method RBRL. |
Tasks | Multi-Label Classification |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01658v1 |
https://arxiv.org/pdf/1911.01658v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-ranking-svm-and-binary-relevance-with |
Repo | https://github.com/GuoqiangWoodrowWu/RBRL |
Framework | none |
GlobalTrack: A Simple and Strong Baseline for Long-term Tracking
Title | GlobalTrack: A Simple and Strong Baseline for Long-term Tracking |
Authors | Lianghua Huang, Xin Zhao, Kaiqi Huang |
Abstract | A key capability of a long-term tracker is to search for targets in very large areas (typically the entire image) to handle possible target absences or tracking failures. However, currently there is a lack of such a strong baseline for global instance search. In this work, we aim to bridge this gap. Specifically, we propose GlobalTrack, a pure global instance search based tracker that makes no assumption on the temporal consistency of the target’s positions and scales. GlobalTrack is developed based on two-stage object detectors, and it is able to perform full-image and multi-scale search of arbitrary instances with only a single query as the guide. We further propose a cross-query loss to improve the robustness of our approach against distractors. With no online learning, no punishment on position or scale changes, no scale smoothing and no trajectory refinement, our pure global instance search based tracker achieves comparable, sometimes much better performance on four large-scale tracking benchmarks (i.e., 52.1% AUC on LaSOT, 63.8% success rate on TLP, 60.3% MaxGM on OxUvA and 75.4% normalized precision on TrackingNet), compared to state-of-the-art approaches that typically require complex post-processing. More importantly, our tracker runs without cumulative errors, i.e., any type of temporary tracking failures will not affect its performance on future frames, making it ideal for long-term tracking. We hope this work will be a strong baseline for long-term tracking and will stimulate future works in this area. Code is available at https://github.com/huanglianghua/GlobalTrack. |
Tasks | Instance Search |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08531v1 |
https://arxiv.org/pdf/1912.08531v1.pdf | |
PWC | https://paperswithcode.com/paper/globaltrack-a-simple-and-strong-baseline-for |
Repo | https://github.com/huanglianghua/GlobalTrack |
Framework | pytorch |
Single-Forward-Step Projective Splitting: Exploiting Cocoercivity
Title | Single-Forward-Step Projective Splitting: Exploiting Cocoercivity |
Authors | Patrick R. Johnstone, Jonathan Eckstein |
Abstract | This work describes a new variant of projective splitting for monotone inclusions, in which cocoercive operators can be processed with a single forward step per iteration. This result establishes a symmetry between projective splitting algorithms, the classical forward-backward splitting method (FB), and Tseng’s forward-backward-forward method (FBF). Another symmetry is that the new procedure allows for larger stepsizes for cocoercive operators: the stepsize bound is $2\beta$ for a $\beta$-cocoercive operator, which is the same as for FB. To complete the connection, we show that FB corresponds to an unattainable boundary case of the parameters in the new procedure. Unlike FB, the new method allows for a backtracking procedure when the cocoercivity constant is unknown. Proving convergence of the algorithm requires some departures from the usual proof framework for projective splitting. We close with some computational tests establishing competitive performance for the method. |
Tasks | |
Published | 2019-02-24 |
URL | https://arxiv.org/abs/1902.09025v2 |
https://arxiv.org/pdf/1902.09025v2.pdf | |
PWC | https://paperswithcode.com/paper/single-forward-step-projective-splitting |
Repo | https://github.com/projective-splitting/coco |
Framework | none |
CASTER: Predicting Drug Interactions with Chemical Substructure Representation
Title | CASTER: Predicting Drug Interactions with Chemical Substructure Representation |
Authors | Kexin Huang, Cao Xiao, Trong Nghia Hoang, Lucas M. Glass, Jimeng Sun |
Abstract | Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on limited labelled data and do not generalize well to unseen drugs or DDIs; and (3) models are characterized by a large number of parameters, thus are hard to interpret. In this work, we develop a ChemicAl SubstrucTurE Representation (CASTER) framework that predicts DDIs given chemical structures of drugs.CASTER aims to mitigate these limitations via (1) a sequential pattern mining module rooted in the DDI mechanism to efficiently characterize functional sub-structures of drugs; (2) an auto-encoding module that leverages both labelled and unlabelled chemical structure data to improve predictive accuracy and generalizability; and (3) a dictionary learning module that explains the prediction via a small set of coefficients which measure the relevance of each input sub-structures to the DDI outcome. We evaluated CASTER on two real-world DDI datasets and showed that it performed better than state-of-the-art baselines and provided interpretable predictions. |
Tasks | Dictionary Learning, Sequential Pattern Mining |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06446v2 |
https://arxiv.org/pdf/1911.06446v2.pdf | |
PWC | https://paperswithcode.com/paper/caster-predicting-drug-interactions-with |
Repo | https://github.com/kexinhuang12345/CASTER |
Framework | none |
BERT for Coreference Resolution: Baselines and Analysis
Title | BERT for Coreference Resolution: Baselines and Analysis |
Authors | Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer |
Abstract | We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO). However, there is still room for improvement in modeling document-level context, conversations, and mention paraphrasing. Our code and models are publicly available. |
Tasks | Coreference Resolution |
Published | 2019-08-24 |
URL | https://arxiv.org/abs/1908.09091v4 |
https://arxiv.org/pdf/1908.09091v4.pdf | |
PWC | https://paperswithcode.com/paper/bert-for-coreference-resolution-baselines-and |
Repo | https://github.com/mandarjoshi90/coref |
Framework | tf |
Regularizing Activation Distribution for Training Binarized Deep Networks
Title | Regularizing Activation Distribution for Training Binarized Deep Networks |
Authors | Ruizhou Ding, Ting-Wu Chin, Zeye Liu, Diana Marculescu |
Abstract | Binarized Neural Networks (BNNs) can significantly reduce the inference latency and energy consumption in resource-constrained devices due to their pure-logical computation and fewer memory accesses. However, training BNNs is difficult since the activation flow encounters degeneration, saturation, and gradient mismatch problems. Prior work alleviates these issues by increasing activation bits and adding floating-point scaling factors, thereby sacrificing BNN’s energy efficiency. In this paper, we propose to use distribution loss to explicitly regularize the activation flow, and develop a framework to systematically formulate the loss. Our experiments show that the distribution loss can consistently improve the accuracy of BNNs without losing their energy benefits. Moreover, equipped with the proposed regularization, BNN training is shown to be robust to the selection of hyper-parameters including optimizer and learning rate. |
Tasks | |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02823v1 |
http://arxiv.org/pdf/1904.02823v1.pdf | |
PWC | https://paperswithcode.com/paper/regularizing-activation-distribution-for |
Repo | https://github.com/ruizhoud/DistributionLoss |
Framework | pytorch |
Bengali Handwritten Character Classification using Transfer Learning on Deep Convolutional Neural Network
Title | Bengali Handwritten Character Classification using Transfer Learning on Deep Convolutional Neural Network |
Authors | Swagato Chatterjee, Rwik Kumar Dutta, Debayan Ganguly, Kingshuk Chatterjee, Sudipta Roy |
Abstract | In this paper, we propose a solution which uses state-of-the-art techniques in Deep Learning to tackle the problem of Bengali Handwritten Character Recognition ( HCR ). Our method uses lesser iterations to train than most other comparable methods. We employ Transfer Learning on ResNet 50, a state-of-the-art deep Convolutional Neural Network Model, pretrained on ImageNet dataset. We also use other techniques like a modified version of One Cycle Policy, varying the input image sizes etc. to ensure that our training occurs fast. We use the BanglaLekha-Isolated Dataset for evaluation of our technique which consists of 84 classes (50 Basic, 10 Numerals and 24 Compound Characters). We are able to achieve 96.12% accuracy in just 47 epochs on BanglaLekha-Isolated dataset. When comparing our method with that of other researchers, considering number of classes and without using Ensemble Learning, the proposed solution achieves state of the art result for Handwritten Bengali Character Recognition. Code and weight files are available at https://github.com/swagato-c/bangla-hwcr-present. |
Tasks | Transfer Learning |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.11133v1 |
http://arxiv.org/pdf/1902.11133v1.pdf | |
PWC | https://paperswithcode.com/paper/bengali-handwritten-character-classification |
Repo | https://github.com/swagato-c/bangla-hwcr-present |
Framework | pytorch |
Modeling Combinatorial Evolution in Time Series Prediction
Title | Modeling Combinatorial Evolution in Time Series Prediction |
Authors | Wenjie Hu, Yang Yang, Zilong You, Zongtao Liu, Xiang Ren |
Abstract | Time series modeling aims to capture the intrinsic factors underpinning observed data and its evolution. However, most existing studies ignore the evolutionary relations among these factors, which are what cause the combinatorial evolution of a given time series. In this paper, we propose to represent time-varying relations among intrinsic factors of time series data by means of an evolutionary state graph structure. Accordingly, we propose the Evolutionary Graph Recurrent Networks (EGRN) to learn representations of these factors, along with the given time series, using a graph neural network framework. The learned representations can then be applied to time series classification tasks. From our experiment results, based on six real-world datasets, it can be seen that our approach clearly outperforms ten state-of-the-art baseline methods (e.g. +5% in terms of accuracy, and +15% in terms of F1 on average). In addition, we demonstrate that due to the graph structure’s improved interpretability, our method is also able to explain the logical causes of the predicted events. |
Tasks | Time Series, Time Series Classification, Time Series Prediction |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.05006v2 |
https://arxiv.org/pdf/1905.05006v2.pdf | |
PWC | https://paperswithcode.com/paper/190505006 |
Repo | https://github.com/VachelHU/ESGRN |
Framework | tf |