Dr Mosquera Orgueira on the Development of a Machine Learning–Based Prognostic Model for Risk Stratification in Myelofibrosis

May 29, 2025

“The study was conducted to explore the visibility and impact of machine learning tools in risk stratification for [patients with] myelofibrosis before allogeneic stem cell transplantation. This is a very complex decision-making process, and we wanted to develop a tool that could enhance the capacities of clinicians through data-driven approaches.”

Adrián Mosquera Orgueira, MD, PhD, a specialist in hematology and hemotherapy, and lead investigator of the Computational and Genomic Hematology Group at the Health Research Institute of Santiago de Compostela, discussed the development and clinical relevance of a machine learning–based prognostic model designed to support risk stratification for patients with myelofibrosis undergoing allogeneic hematopoietic stem cell transplantation (allo-HSCT). The model was built using data from the EBMT registry, encompassing over 5000 patients with myelofibrosis who subsequently underwent allo-HSCT.

The model employed a random survival forest (RSF) algorithm to estimate overall survival and relapse-free survival outcomes, Mosquera Orgueira explained. Ten routinely collected baseline variables were used as inputs with the goal of producing a tool that could be broadly implemented across transplant centers. Compared with traditional statistical models, such as Cox regression, and alternative machine-learning methods, including XGBoost and deep-learning architectures, the RSF model offered modest improvements in predictive accuracy.

A key advantage of the RSF approach was its flexibility in modeling nonlinear interactions and handling high-dimensional data without relying on proportional hazards assumptions, Mosquera Orgueira noted. Internal validation demonstrated the model’s capacity to stratify patients by post-transplant outcomes without compromising generalizability through cross-validation techniques.

However, the model’s predictive power remains incomplete, with a notable proportion of post-transplant events—particularly non-relapse mortality—left unexplained, he acknowledged. This limitation may reflect biologic heterogeneity and the absence of molecular or treatment-specific variables in the dataset. Future iterations of the model may benefit from the integration of genomic biomarkers, patient-reported outcomes, and dynamic clinical variables, Mosquera Orgueira suggested.

Although exploratory analyses examined the potential use of the model in guiding transplant-related decisions, such as conditioning intensity or graft-vs-host disease prophylaxis, no actionable clinical patterns were identified, he stated. As such, the model should not be used to determine specific therapeutic interventions, he emphasized.