BMC Infect Dis
. 2025 Sep 1;25(1):1088.
doi: 10.1186/s12879-025-11502-4. Development of an early prediction model for risk of influenza A and influenza B based on complete blood count examination
Xiefei Hu 1 , Chunmei Duan 2 , Huajian Chen 1 , Xun Li 1 , Qianyu Jing 1 , Qin Ma 1 , Shunli Cai 1 , Haiping Fan 3 , Shenshen Zhi 4 , Wei Li 5
Affiliations
Background: Influenza A (IAV) and B (IBV) viruses are the primary etiologic agents driving seasonal influenza epidemics and global pandemics. Early prediction plays a crucial role in epidemic control and reducing mortality rates. Complete blood count (CBC), a widely used clinical tool, provides rapid and non-invasive hematological biomarkers that offer diagnostic value during the pre-pathogen confirmation phase. This study proposes a machine-learning (ML) algorithm leveraging CBC parameters to distinguish IAV and IBV from other infections. This approach may complement nucleic acid tests and antigen assays, enabling timely interventions and reducing diagnostic delays.
Methods: This study retrospectively collected CBC data from patients presenting with influenza-like symptoms at Chongqing Emergency Medical Center, Chongqing, China. Patient records meeting inclusion criteria between January 1, 2023, and December 31, 2023, were compiled into a model development dataset, which was subsequently partitioned into training and internal validation subsets at an 8:2 ratio. An independent external validation cohort was collected from January 1, 2024, to February 29, 2024. We employed various machine learning (ML)-based models, using 25 features, to predict the incidence of influenza A and B and calculated the Shapley Additive Explanation (SHAP) values.
Results: The study cohort comprised 3,106 patients (453 influenza-positive cases, 14.6%; 2,653 negative controls, 85.4%). From this population, 2,925 eligible cases were allocated to the model development dataset, stratified into training (n = 2,340) and internal validation (n = 585) subsets through an 8:2 split. An independent external validation cohort containing 181 patients was collected. In the external validation, the ensemble model using voting with adaptive boosting (ADB) and the Extreme Gradient Boosting (XGB) achieved an area under the receiver operating characteristics curve (AUROC) of 0.810. SHAP analysis identified the top five hematologic parameters with dominant predictive influence in the RF model: MON%, LYM, WBC, RBC, and NEU/MON.
Conclusions: This analysis establishes RF and ADB-XGB model as the optimal CBC-based machine learning framework for discriminating influenza A and B infections. The model’s operational simplicity enables rapid triage implementation in resource-constrained emergency departments, particularly valuable when molecular confirmation (RT-PCR) is unavailable.
Supplementary Information: The online version contains supplementary material available at 10.1186/s12879-025-11502-4.
Keywords: Complete blood count; Early prediction; Influenza; Machine learning.
. 2025 Sep 1;25(1):1088.
doi: 10.1186/s12879-025-11502-4. Development of an early prediction model for risk of influenza A and influenza B based on complete blood count examination
Xiefei Hu 1 , Chunmei Duan 2 , Huajian Chen 1 , Xun Li 1 , Qianyu Jing 1 , Qin Ma 1 , Shunli Cai 1 , Haiping Fan 3 , Shenshen Zhi 4 , Wei Li 5
Affiliations
- PMID: 40890653
- PMCID: PMC12403889
- DOI: 10.1186/s12879-025-11502-4
Background: Influenza A (IAV) and B (IBV) viruses are the primary etiologic agents driving seasonal influenza epidemics and global pandemics. Early prediction plays a crucial role in epidemic control and reducing mortality rates. Complete blood count (CBC), a widely used clinical tool, provides rapid and non-invasive hematological biomarkers that offer diagnostic value during the pre-pathogen confirmation phase. This study proposes a machine-learning (ML) algorithm leveraging CBC parameters to distinguish IAV and IBV from other infections. This approach may complement nucleic acid tests and antigen assays, enabling timely interventions and reducing diagnostic delays.
Methods: This study retrospectively collected CBC data from patients presenting with influenza-like symptoms at Chongqing Emergency Medical Center, Chongqing, China. Patient records meeting inclusion criteria between January 1, 2023, and December 31, 2023, were compiled into a model development dataset, which was subsequently partitioned into training and internal validation subsets at an 8:2 ratio. An independent external validation cohort was collected from January 1, 2024, to February 29, 2024. We employed various machine learning (ML)-based models, using 25 features, to predict the incidence of influenza A and B and calculated the Shapley Additive Explanation (SHAP) values.
Results: The study cohort comprised 3,106 patients (453 influenza-positive cases, 14.6%; 2,653 negative controls, 85.4%). From this population, 2,925 eligible cases were allocated to the model development dataset, stratified into training (n = 2,340) and internal validation (n = 585) subsets through an 8:2 split. An independent external validation cohort containing 181 patients was collected. In the external validation, the ensemble model using voting with adaptive boosting (ADB) and the Extreme Gradient Boosting (XGB) achieved an area under the receiver operating characteristics curve (AUROC) of 0.810. SHAP analysis identified the top five hematologic parameters with dominant predictive influence in the RF model: MON%, LYM, WBC, RBC, and NEU/MON.
Conclusions: This analysis establishes RF and ADB-XGB model as the optimal CBC-based machine learning framework for discriminating influenza A and B infections. The model’s operational simplicity enables rapid triage implementation in resource-constrained emergency departments, particularly valuable when molecular confirmation (RT-PCR) is unavailable.
Supplementary Information: The online version contains supplementary material available at 10.1186/s12879-025-11502-4.
Keywords: Complete blood count; Early prediction; Influenza; Machine learning.