The most important variables associated with death due to COVID-19 disease, based on three data mining models Decision Tree, AdaBoost, and Support Vector Machine: A cross-sectional study.

Gharehhasani BS, Rezaei M, Naghipour A, Sayad N, Mostafaei S, Alimohammadi E

Health Sci Rep 7 (7) e2266 [2024-07-00; online 2024-07-25] Open Access

Death due to covid-19 is one of the biggest health challenges in the world. There are many models that can predict death due to COVID-19. This study aimed to fit and compare Decision Tree (DT), Support Vector Machine (SVM), and AdaBoost models to predict death due to COVID-19. To describe the variables, mean (SD) and frequency (%) were reported. To determine the relationship between the variables and the death caused by COVID-19, chi-square test was performed with a significance level of 0.05. To compare DT, SVM and AdaBoost models for predicting death due to COVID-19 from sensitivity, specificity, accuracy and the area under the rock curve under R software using psych, caTools, random over-sampling examples, rpart, rpartplot packages was done. Out of the total of 23,054 patients studied, 10,935 cases (46.5%) were women, and 12,569 cases (53.5%) were men. Additionally, the mean age of the patients was 54.9 ± 21.0 years. There is a statistically significant relationship between gender, fever, cough, muscle pain, smell and taste, abdominal pain, nausea and vomiting, diarrhea, anorexia, dizziness, chest pain, intubation, cancer, diabetes, chronic blood disease, Violation of immunity, pregnancy, Dialysis, chronic lung disease with the death of covid-19 patients showed (p < 0.05). The results showed that the sensitivity, specificity, accuracy and the area under the receiver operating characteristic curve were respectively 0.60, 0.68, 0.71, and 0.75 in the DT model, 0.54, 0.62, 0.63, and 0.71 in the SVM model, and 0.59, 0.65, 0.69 and 0.74 in the AdaBoost model. The results showed that DT had a high predictive power compared to other data mining models. Therefore, it is suggested to researchers in different fields to use DT to predict the studied variables. Also, it is suggested to use other approaches such as random forest or XGBoost to improve the accuracy in future studies.

Category: Other

Type: Journal article

PubMed 39055612

DOI 10.1002/hsr2.2266

Crossref 10.1002/hsr2.2266

pmc: PMC11269761
pii: HSR22266