2 Clarke Drive
Suite 100
Cranbury, NJ 08512
© 2025 MJH Life Sciences™ and OncLive - Clinical Oncology News, Cancer Expert Insights. All rights reserved.
Eddy Saad, MD, discusses how AI-generated synthetic cohorts derived from real-world data may accelerate trial design and collaboration in metastatic breast cancer.
“We tried to look into artificial intelligence models that could use synthetic and real-world data to create, a twin data set that matches the same statistical patterns.”
Eddy Saad, MD, a postdoctoral research fellow at Dana-Farber Cancer Institute, discussed how the growing need for timely, high-quality evidence in metastatic breast cancer informed the rationale for exploring AI-generated synthetic cohorts as a complementary tool for clinical trial design and multi-stakeholder collaboration.
The oncology field is experiencing a rapid increase in accelerated FDA approvals, creating substantial pressure to generate robust data to guide therapeutic decision-making. Although traditional clinical trials remain essential, Saad noted that they require significant time, financial investment, and operational resources. Real-world data (RWD) has emerged as an alternative source of evidence; however, its utility is often limited by privacy protections, regulatory constraints, and restricted data-sharing agreements across institutions.
To address these barriers, Saad and colleagues evaluated the use of artificial intelligence models capable of generating synthetic datasets derived from a large RWD repository of approximately 19,000 patients with metastatic breast cancer. These models create statistically representative “twin” datasets that reflect the distributions and clinical relationships present in real-world populations without containing identifiable patient information. This approach enables privacy-preserving data sharing while maintaining the analytical value of the underlying data.
Based on these capabilities, synthetic cohorts may support more efficient trial development, Saad explained. Investigators can simulate eligibility criteria, event rates, and control arms, allowing refinement of study design before trial activation. These datasets may also be shared among academic groups, industry sponsors, and regulatory collaborators without the constraints typically associated with protected health information, facilitating alignment on feasibility assumptions and methodological planning.
Saad emphasized that although synthetic datasets cannot replace prospective studies, they offer a practical and scalable solution to support early-phase trial design and accelerate cross-institutional collaboration. Importantly, the breadth of the underlying metastatic breast cancer dataset allows these synthetic models to capture clinically relevant heterogeneity, improving simulation accuracy for diverse subpopulations.
Thus, the development of synthetic cohorts aims to streamline evidence generation, enhance collaborative trial planning, and help investigators more efficiently evaluate emerging therapeutic strategies in metastatic breast cancer.
Related Content: