Multimodal AI Model Prognostic for Long-Term Recurrence Following Treatment for Early Breast Cancer

The multimodal ICM+ model was prognostic for long-term recurrence following treatment in early breast cancer.

An artificial intelligence (AI) model trained on data from whole slide images (I), clinical features (C), and enhanced molecular analysis (M+) from the phase 3 TAILORx study (NCT00310180) showed better prognostic value for recurrence compared with the Oncotype DX recurrence scores alone, according to a presentation at the 2025 San Antonio Breast Cancer Symposium.1

The AI model, labeled ICM+ after the multimodal training set used, was found to significantly improve prognostic performance for overall and late distant recurrence in both a training and validation set of data from TAILORx. Other multimodal models were also examined, with the addition of pathologic data from whole slide images making a key difference for late recurrence. The findings offer the potential to better tailor treatments for early breast cancer based on risk assessment.

“ICM+ risk stratification identified clinically relevant absolute differences in distant recurrence risk in all but the low recurrence score arm, ranging from about 13% for recurrence score 11 to 25 to as high at 56% if the recurrence score was 26 to 100,” lead investigator Joseph A. Sparano, MD, from the Icahn School of Medicine at Mount Sinai, Tisch Cancer Institute, in New York, said during a presentation of the data. “The ICM+ model also provided statistically significant and clinically relevant prognostic stratification in the Oncotype low and high genomic risk groups."

TAILORx Details and Initial Findings

TAILORx included 10,273 patients with hormone receptor (HR)-positive, HER2-negative, axillary node-negative breast cancer. Patients were stratified by risk using the Oncotype DX assay, with 9719 having follow-up information available for analysis. Overall, 69% (n = 6711) had an intermediate recurrence score of 11 to 25; 17% (n = 1619) had a score of 10 or lower; and 14% (n = 1389) had a score of 26 or higher.

Those in the low-risk group received endocrine therapy alone while patients in the high-risk group were treated with the combination of chemotherapy and endocrine therapy. Patients in the intermediate group were randomized to receive either endocrine therapy alone (n = 3399) or chemotherapy plus endocrine therapy (n = 3312). Endocrine therapy most consisted of an aromatase inhibitor for postmenopausal women and tamoxifen alone or with an aromatase inhibitor for premenopausal women.

Initial findings reported in the New England Journal of Medicine,2 endocrine therapy was noninferior to chemotherapy and endocrine therapy for invasive disease-free survival (DFS) in the intermediate risk group (HR, 1.08; 95% CI, 0.94-1.24; P = .26). Within this analysis, some benefits were seen with the addition of chemotherapy for those older than 50 years of age or with a recurrence score of 16 to 25. Other end points were also noninferior, including distant recurrence-free survival (HR, 1.10; P = .48), local and distant recurrence-free survival, (HR, 1.11; P = .33), and overall survival (0.99; P = .89).

Design and Training Set for AI Models

Digitized Hematoxylin and Eosin (H&E) 40x whole slide images by Pramana scanner along with whole transcriptome sequences were used to train the AI model. These were available from 4462 primary tumor samples from the TAILORx study. Sequencing of the samples was completed by the Caris MI Tumor Seek-Hybrid. Of the samples, 2808 were used to train the AI model using a 5-fold nested cross validation process using 60, 20, 20 splits, a technique that is commonly used for machine learning models. The remaining 1621 samples were used as a validation set.

The training and validation set were not significantly different in their composition for most features nor were they significantly different from the whole TAILORx population. In the training and validation sets, the median age of patients was 56 years, with two-thirds being postmenopausal (66.2%). The tumor size was 2 cm or smaller for 72.4% of patients and 90.1% were PR-positive in addition to ER-positive. Low clinical risk was present for 66.9% of patients.

There was a slight statistically significant differences between the sets for those with high grade and low-grade tumors, Sparano noted. The training set had 20.7% of patients with high-grade tumors compared with 19.7% for the validation set (P = .004). For low-grade, the training set contained 24.2% of patients compared with 28.4% for the validation set.

The expanded molecular analysis examined genes across 5 commercially available gene signatures, namely Oncotype DX, MammaPrint, Prosigna, EndoPredict, and BCI. Fifty-seven other high-variance genes were also examined, 6 of which overlapped with the tests. After exploring these various tests, the final enhanced molecular model (M+) contained only EndoPredict, BCI, and Oncotype DX gene signatures (42 genes in total).

The ultimate objective behind the design of the AI model was to provide better prognostic information than the Oncotype DX test alone, particularly for late (>5 years) distant recurrences. The truncated concordance index (C-index) was used to assess the prognostic concordance of the model, wherein a 1 represents high prognostication and a number closer to 0.5 represents a random event. A 0.7 is generally accepted as a good probability of correct discrimination, Sparano noted. The best performing model in the training portion of the study was explored in the validation set.

Study Findings for Prognostic AI Models

Prognostic performance was first tested using the established methods. The Oncotype DX recurrence score alone produced a C-index score of 0.617 for all distant recurrences and a C-index score of 0.738 for early (<5 years) distant recurrences. There was no prognostic value for late distant recurrence seen with the test (C-index, 0.518). Clinical features alone were also utilized as a single-modality training set. For all distant recurrences, the C-index was 0.634 and for early distant recurrences it was 0.686. For late distant recurrences, the C-index was 0.590 for clinical features alone.

For the Model-ICM+, the C-index for overall distant recurrence was 0.705, which significantly outperformed the Oncotype DX test (P <.001). For late distant recurrence specifically, the C index was 0.656 (P <.001 vs Oncotype DX). For early distant recurrence, the C index was 0.765, which was not superior to Oncotype DX (P = .398). In a multivariate analysis, the ICM+ model demonstrated significant risk discrimination for overall, early, and late distant recurrence.

By most measures, the Model-CM+ was less effective than the ICM+ model but still showed improvements over traditional methods. The C-index for overall distant recurrence was 0.674, outperforming Oncotype DX (P = .003). For late distant recurrence, the C index was 0.589, which was superior to Oncotype DX (P = .010). For early distant recurrence, the C index was 0.776 and was not superior to Oncotype DX (P = .120).

"For late distant recurrence, the ICM+ model, which included pathomic imaging, was significantly better than the clinical model, recurrence score, and CM+ model, none of which had pathomic imaging," said Sparano.

Findings in the Validation Dataset

The models were also applied to the validation set. In this group, the ICM+ model outperformed the Oncotype DX in terms of prognostic value for overall distant recurrences at 15 years (P = .00049). The C-index with the AI model was 0.733 compared with 0.631 for Oncotype DX. The ICM+ model was also superior to Oncotype DX for late recurrences after 5 years (P = ,000031). The C-indices for these groups were 0.705 and 0.527, respectively.

“AI-based pathomic tools that rely on evaluation of tissue sample slides routinely generated from clinical practice can be captured with scanners or even widely available smartphones, uploaded electronically, and analyzed centrally with minimal cost,” Sparano said.

For distant recurrence at 15 years, the ICM+ model listed 7.2% of patients with a standard recurrence score of 0-10 as high risk. For those with a score from 11 to 25 who received endocrine therapy alone, the AI model placed 13.8% as high risk. For those with a score of 11-25 who received chemotherapy and endocrine therapy, the model placed 12.6% at high risk.

"The ICM+ provides additional risk stratification for each recurrence score group," said Sparano. "The ICM+ risk stratification identified clinically relevant absolute differences in distant recurrence risk across arms ranging from about 12% to 25%, including 16% if the recurrence score was 11 to 25."

ICM+ Model Performance in Validation Set

In the validation set of patients, the ICM+ model and the CM+ model continued to perform well for risk evaluation. In this group, the C-index was 0.733 for the ICM+ model and 0.739 for the CM+ model for overall distant recurrence. These models were both superior to recurrence score alone (P <.001). For late distant recurrences, the C-index was 0.707 and 0.705 for the CM+ and ICM+ models, respectively, compared with 0.527 for the recurrence score (P <.001). For early distant recurrence, both models were numerically superior but not statistically. The CM+ and ICM+ models were not significantly different compared with each other.

The hazard ratios for 15-year distant recurrence were 5.554 and 4.242 for the CM+ and ICM+ models, respectively, between the high and low risk groups. For the recurrence score, the hazard ratio was 3.074, suggesting better prognostic stratification for the ICM+ and CM+ models compared with the recurrence score and clinical features alone, Sparano noted. The multivariate analysis confirmed the prognostic risk discrimination for the ICM+ model for overall, early, and late distant recurrence.

For distant recurrence at 15 years, the ICM+ model listed 2.3% of patients with a recurrence score of 0-10 as high risk. For those with a score from 11 to 25 who received endocrine therapy alone, the AI model listed 9.3% as high risk. For those with a score of 11-25 who received chemotherapy and endocrine therapy, the model indicated 10.8% at high risk.

"ICM+ risk stratification identified clinically relevant absolute differences in distant recurrence risk in all but the low RS arm, ranging from about 13% for RS 11 to 25 to as high at 56% if the RS was 26 to 100," said Sparano.

References

  1. Sparano JA, Wang V, Gray RJ, et al. Multimodal artificial intelligence (AI) models integrating image, clinical, and molecular data for predicting early and late breast cancer recurrence in TAILORx. Presented at: San Antonio Breast Cancer Conference; December 9-12, 2025; San Antonio, TX. Abstract GS1-09.
  2. Sparano JA, Gray RJ, Makower DF, et al. Adjuvant Chemotherapy Guided by a 21-Gene Expression Assay in Breast Cancer. N Engl J Med. 2018; 379:111-121.