The compatibility of Federated Learning (FL) models with unseen Out-Of-Federation (OOF) centers remains a critical yet underexplored challenge, particularly when dealing with heterogeneous data. To address this gap, this study proposes a data-driven approach to assess the feasibility of applying an FL model to OOF centers. The case study explored is the prediction of diabetic retinopathy from multiple real-world, highly heterogeneous electronic health records. An FL XGBoost model (FL-XGB) is trained across five in-federation (IF) centers, showing an average test Area Under the ROC Curve (AUC) of 75.27%. A novel metric, the OOF Applicability (OFA) predictor, is introduced to estimate whether FL-XGB could be safely applied to the 15 OOF centers. OFA combines statistical and learnable features from both IF and OOF centers and is used as a predictor for a regression model, employed to estimate the performance of FL-XGB (in terms of AUC) on OOF datasets. The regression model achieved a confidence of 76% in predicting AUC values, with a statistically significant p-value (≪ 0.001). The average discrepancy between the predicted and observed AUC values was 6%. Overall, FL-XGB shows robust performance on IF centers and the OFA predictor plays a crucial role in assessing its applicability to infer on unseen OOF centers. By providing statistically significant estimations, OFA effectively identifies OOF centers whose characteristics are too divergent from what the FL model can effectively manage. Our codes are available at https://github.com/geronimaw/OFA4FL.
Federated Learning Towards the Unknown: A Deep Dive Into Diabetic Retinopathy Prediction from Real-World EHR Structured Data on Unseen Diabetic Centers
Bernardini, Michele
2025-01-01
Abstract
The compatibility of Federated Learning (FL) models with unseen Out-Of-Federation (OOF) centers remains a critical yet underexplored challenge, particularly when dealing with heterogeneous data. To address this gap, this study proposes a data-driven approach to assess the feasibility of applying an FL model to OOF centers. The case study explored is the prediction of diabetic retinopathy from multiple real-world, highly heterogeneous electronic health records. An FL XGBoost model (FL-XGB) is trained across five in-federation (IF) centers, showing an average test Area Under the ROC Curve (AUC) of 75.27%. A novel metric, the OOF Applicability (OFA) predictor, is introduced to estimate whether FL-XGB could be safely applied to the 15 OOF centers. OFA combines statistical and learnable features from both IF and OOF centers and is used as a predictor for a regression model, employed to estimate the performance of FL-XGB (in terms of AUC) on OOF datasets. The regression model achieved a confidence of 76% in predicting AUC values, with a statistically significant p-value (≪ 0.001). The average discrepancy between the predicted and observed AUC values was 6%. Overall, FL-XGB shows robust performance on IF centers and the OFA predictor plays a crucial role in assessing its applicability to infer on unseen OOF centers. By providing statistically significant estimations, OFA effectively identifies OOF centers whose characteristics are too divergent from what the FL model can effectively manage. Our codes are available at https://github.com/geronimaw/OFA4FL.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


