J Clin Invest. 2026 Feb 16;136(4):e195228. doi: 10.1172/JCI195228. eCollection 2026 Feb 16.
ABSTRACT
BACKGROUNDChronic graft-versus-host disease (cGVHD) is a major contributor to nonrelapse mortality (NRM) following hematopoietic cell transplantation (HCT). Whether machine-learning (ML) models with biomarkers improve the accuracy for predicting future cGVHD/NRM is not established.METHODSWe developed BIOPREVENT (BIOmarkers PREVENTion), a ML algorithm using data from 1,310 HCT recipients, incorporating 7 plasma proteins measured at Day 90/100 post-HCT and 9 clinical variables. Patients were divided into training and validation datasets. ML models — including CoxXGBoost, Group SCAD, Adaptive Group Lasso, Random Survival Forests, and Bayesian Additive Regression Trees (BART) — were used to estimate time-varying Area Under the ROC Curve (AUCt) at Days 180, 270, 360, and 540. Deep learning models were also evaluated.RESULTSML models with biomarkers outperformed clinical-only models for predicting cGVHD, with BART and CoxXGBoost achieving AUCt greater than 0.65 at 1 year. For NRM, models with biomarkers achieved AUCt ranging from 0.75-0.91. Deep learning did not outperform other ML approaches. BART consistently demonstrated high predictive accuracy and was selected for the final BIOPREVENT model. Calibration curves aligned with observed values. Variable importance analysis identified MMP3 and CXCL9 as key for cGVHD prediction and IL1RL1 and sCD163 for NRM. Cumulative incidences of cGVHD and NRM differed significantly based on BIOPREVENT-defined cutpoints.CONCLUSIONBIOPREVENT accurately predicts individual risk of future cGVHD and NRM using biomarkers at 3 months post-HCT. A publicly available R Shiny web application supports its clinical use. Further studies are needed to explore its role in guiding preemptive therapy.TRIAL REGISTRATIONBMTCTN 0201, BMTCTN 1202, and NCT02194439.FUNDINGR01CA264921, U10HL069294, U24HL138660, R01HD074587, and P01HL158505.
PMID:41697751 | DOI:10.1172/JCI195228
