PLoS Comput Biol
. 2025 Nov 24;21(11):e1013695.
doi: 10.1371/journal.pcbi.1013695. eCollection 2025 Nov. Federated learning for COVID-19 mortality prediction in a multicentric sample of 21 hospitals
Roberta Moreira Wichmann 1 , Murilo Afonso Robiati Bigoto 2 , Alexandre Dias Porto Chiavegatto Filho 2
Affiliations
We evaluated Federated Learning (FL) strategies for predicting COVID-19 mortality using a multicenter sample of 17,022 patients from 21 diverse Brazilian hospitals. We tested horizontal FL architectures employing Logistic Regression (LR) and a Multi-Layer Perceptron (MLP) via parameter aggregation, alongside a novel Federated Random Forest (RF) using ensemble aggregation. Performance gain (ΔAUC, calculated as AUC[Formula: see text] minus AUC[Formula: see text]) was quantified using bootstrap analysis to determine 95% confidence intervals. FL models demonstrated a beneficial collaborative effect. The average ΔAUC across the network was +0.0018 for LR, +0.0599 for MLP, and +0.0528 for RF. Crucially, the gain's magnitude and statistical significance showed a strong inverse correlation with local patient volume (N). Substantial and statistically significant gains concentrated in data-limited institutions (N < 500). For example, the smallest hospital (N=86) achieved a remarkable ΔAUC of 0.3682 (95% CI [0.0908, 0.6307]) with the RF model. However, interpreting these benefits requires caution because the 95% CIs for ΔAUC crossed zero for the majority of hospitals, suggesting the collaborative model's statistical advantage is not universally certain at every site. This trade-off was particularly evident with the MLP model which, despite achieving the highest average ΔAUC, was the most volatile algorithm, registering the maximum performance degradation in the network (ΔAUC = -0.0884, 95% CI [-0.1527, -0.0273]) due to its high sensitivity to local data distribution disparities (non-IID). This study validates FL as an equity-enabling mechanism that effectively enhances predictive capacity where local data scarcity is highest. Our findings underscore that maximizing the most statistically certain benefits of FL requires continuous monitoring and local validation for successful clinical deployment across diverse settings.
. 2025 Nov 24;21(11):e1013695.
doi: 10.1371/journal.pcbi.1013695. eCollection 2025 Nov. Federated learning for COVID-19 mortality prediction in a multicentric sample of 21 hospitals
Roberta Moreira Wichmann 1 , Murilo Afonso Robiati Bigoto 2 , Alexandre Dias Porto Chiavegatto Filho 2
Affiliations
- PMID: 41284740
- DOI: 10.1371/journal.pcbi.1013695
We evaluated Federated Learning (FL) strategies for predicting COVID-19 mortality using a multicenter sample of 17,022 patients from 21 diverse Brazilian hospitals. We tested horizontal FL architectures employing Logistic Regression (LR) and a Multi-Layer Perceptron (MLP) via parameter aggregation, alongside a novel Federated Random Forest (RF) using ensemble aggregation. Performance gain (ΔAUC, calculated as AUC[Formula: see text] minus AUC[Formula: see text]) was quantified using bootstrap analysis to determine 95% confidence intervals. FL models demonstrated a beneficial collaborative effect. The average ΔAUC across the network was +0.0018 for LR, +0.0599 for MLP, and +0.0528 for RF. Crucially, the gain's magnitude and statistical significance showed a strong inverse correlation with local patient volume (N). Substantial and statistically significant gains concentrated in data-limited institutions (N < 500). For example, the smallest hospital (N=86) achieved a remarkable ΔAUC of 0.3682 (95% CI [0.0908, 0.6307]) with the RF model. However, interpreting these benefits requires caution because the 95% CIs for ΔAUC crossed zero for the majority of hospitals, suggesting the collaborative model's statistical advantage is not universally certain at every site. This trade-off was particularly evident with the MLP model which, despite achieving the highest average ΔAUC, was the most volatile algorithm, registering the maximum performance degradation in the network (ΔAUC = -0.0884, 95% CI [-0.1527, -0.0273]) due to its high sensitivity to local data distribution disparities (non-IID). This study validates FL as an equity-enabling mechanism that effectively enhances predictive capacity where local data scarcity is highest. Our findings underscore that maximizing the most statistically certain benefits of FL requires continuous monitoring and local validation for successful clinical deployment across diverse settings.