PLoS Comput Biol. 2021 Jun 23;17(6):e1009121. doi: 10.1371/journal.pcbi.1009121. eCollection 2021 Jun.
Identification of those at greatest risk of death due to the substantial threat of COVID-19 can benefit from novel approaches to epidemiology that leverage large datasets and complex machine-learning models, provide data-driven intelligence, and guide decisions such as intensive-care unit admission (ICUA). The objective of this study is two-fold, one substantive and one methodological: substantively to evaluate the association of demographic and health records with two related, yet different, outcomes of severe COVID-19 (viz., death and ICUA); methodologically to compare interpretations based on logistic regression and on gradient-boosted decision tree (GBDT) predictions interpreted by means of the Shapley impacts of covariates. Very different association of some factors, e.g., obesity and chronic respiratory diseases, with death and ICUA may guide review of practice. Shapley explanation of GBDTs identified varying effects of some factors among patients, thus emphasising the importance of individual patient assessment. The results of this study are also relevant for the evaluation of complex automated clinical decision systems, which should optimise prediction scores whilst remaining interpretable to clinicians and mitigating potential biases.