falsefalse

Natural Language Processing Streamlines Clinical Encounter Note Collection in Indolent Systemic Mastocytosis

Oncology Live®, Vol. 26 No. 7, Volume 26, Issue 7

Natural language processing models could help streamline processes within clinical settings.

Thanai Pongdee, MD

Thanai Pongdee, MD

In an era of timely technological advances, manual reviews of health records will soon be history, as large language models (LLMs)—a type of artificial intelligence program—emerge in clinical landscapes. To address this evolution, investigators utilized natural language processing (NLP) to filter through electronic medical records (EMRs) and quickly provide deeper insights into symptoms and the related burden in patients with indolent systemic mastocytosis (ISM).1

“There are millions of clinical records that we have in our own institutional EMR. To manually mine those records takes a longer amount of time while trying to go beyond just using ICD [International Classification of Diseases] codes, which can be relatively limiting, and you may lose a lot of information,” Thanai Pongdee, MD, codirector of the Population Health Research Program at Mayo Clinic Health System (MCHS) in Rochester, Minnesota, said in an interview with OncologyLive. “To bridge that gap between ICD codes or more structured [information] that may be somewhat searchable, [as we] capture all of the data that are available, [we used] an NLP model to assess all of the records in an automated fashion. The model is constructed in a way to [identify] patients with ISM without having to manually [search through] thousands upon thousands of records.”

Following the 2025 American Academy of Allergy, Asthma & Immunology/World Allergy Organization Joint Congress, OncologyLive spoke with Pongdee to discuss the implications of using their NLP model, its role in streamlining processes within the clinical setting, and the data from the study of the model that were presented during the meeting.

Utilizing the NLP Approach

Traditional statistical methods could still be effective in collecting patient-reported data; however, the use of LLMs could replace manual efforts, notably because of their generalizability.2 The adoption of LLMs could further revolutionize telemedicine, particularly as it relates to provider workflows without missing important patient-reported data, such as symptoms.

In a matched, retrospective, cohort study, Pongdee and his coinvestigators attained deidentified data from patients with ISM between January 1, 2005, and June 30, 2023, from the Mayo Clinic EMR database from all of its sites in the US, including in Minnesota, Arizona, and Florida, and other MCHS sites in Minnesota, Iowa, and Wisconsin.1 Patient demographics, diagnostic workup for ISM, symptoms, comorbidities, health care resource use, and medication use were assessed.

Data from 203 patients in the final ISM cohort were included, and 2030 matched control patients were also included, with a propensity score matched 10 to 1 based on characteristics including race, ethnicity, sex, age at index, Quan-Charlson Comorbidity Index score, body mass index at index, and smoking status. Furthermore, the investigators curated the structured EMR data and used the NLP to review unstructured clinical notes.

“We looked at all records available since 2005, when the EMR [went] online throughout the Mayo Clinic, and based on codes, and using the NLP, we narrowed that funnel until we identified 931 patients with ISM,” Pongdee explained. “We wanted to capture individual patients who had been seen [recently] so we could more carefully look at the disease burden.”

Among patients from the final ISM cohort, the mean age was 51.4 years, the majority were female (66.5%), and the mean follow-up time was 4.4 years.1 Race/ethnicity included White (93%); Black, Asian, other, or unknown (7%); and Hispanic or Latino (5%). Sites of diagnosis included Minnesota (41%), Arizona (29%), Florida (12%), and other MCHS sites (6%). The index of diagnosis years included 2005-2016 (20%), 2017 (8%), 2018 (19%), 2019 (16%), 2020 (10%), 2021 (17%), and 2022 (10%). Notably, the mean number of distinct symptoms reported at baseline and follow-up was 10.6 and 13.3, respectively. The prevalence of most reported symptoms during the 6-month baseline included allergic reaction (58%), lymphadenopathy (57%), diarrhea (54%), dyspnea (53%), nausea (53%), fatigue (52%), and dermatologic symptoms (66%), including angioedema, flushing, cutaneous mastocytosis, pruritus, and urticaria.

In patients with a real-world diagnosis of ISM, 66% received a biopsy of any type, with 40%, 41%, 50%, 5%, and 4% of patients receiving skin, bone marrow, gastrointestinal, lymph node, and liver biopsies, respectively. Of note, 68% of patients received KIT testing, of which 54% had KIT detection. The percentages of patients with 0, 1, 2 to 3, 4 to 5, and 6 to 8 additional comorbidities were 3%, 10%, 51%, 27%, and 9%, respectively. Investigators noted that although there is mixed methodology with structured EMR data and NLP, specific information regarding testing and diagnostic workup that occurred outside of the Mayo Clinic system may not be included in clinical notes.

NLP Efficiently Collects Varied Data Elements

Data from the study demonstrated that the machine learning technology efficiently collected more varied data elements compared with the traditional manual approach, as it pulled additional data and context. The use of NLP also helped create comprehensive clinical encounter notes with data labels.

Data associated with specified comorbid conditions in patients from the ISM and control cohorts—determined with the help of NLP—revealed 9 common conditions, including osteoporosis/osteopenia (ISM, 67%; control, 34%), diabetes (50%; 36%), obesity (50%; 38%), irritable bowel syndrome (32%; 12%), high cholesterol (24%; 12%), cutaneous mastocytosis (24%; 2%), high blood pressure (23%; 13%), heart attack (17%; 8%), and coronary artery disease (6%; 3%). The proportion of patients from both cohorts with specified allergies included food (24%; 7%), environmental (9%; 5%), drug allergies (8%; 4%), stinging insect (7%; 1%), latex (7%; 3%), radiocontrast (5%; 1%), dander/ pets (3%; 1%), and venom (< 1%; < 1%). Investigators noted that patients with ISM are more likely to have food and stinging insect allergies.

Regarding health care resources and medication use, investigators found that patients with ISM had significantly higher rates of both health care resource and medication use compared with those in the control cohort. Specifically, the use of health care resources, measured by the mean number of visits to a health care center per year, included outpatient (ISM, 7.5; control, 2.1), emergency department (1.0; 0.5), inpatient (2.5; 1.3), and critical care (0.8; 0.5). Moreover, all-time medication use in the respective cohorts included H1 antihistamines (96%; 57%), corticosteroids (89%; 61%), H2 antihistamines (83%; 26%), epinephrine (78%; 19%), leukotriene modulators (70%; 10%), aspirin (66%; 41%), proton pump inhibitors (63%; 42%), anticoagulants (56%; 39%), and cromolyn (45%; 0%).

“The model performed quite well. It was much more time efficient to query all of the symptoms that it could see. It took a global view and evaluated all the symptoms that were reported in the EMR, and was able to compile that [information] well,” Pongdee said. “Going through that individually would have taken quite some time, along with looking at lab results, procedures, several outpatient visits, hospitalizations, emergency department visits, and comorbid factors. It did this in an automated fashion and was a powerful way to look at the health records.”

References

  1. Pongdee T, Powell D, Weis T, et al. Assessing real-world natural history of indolent systemic mastocytosis: retrospective matched cohort study from Mayo Clinic Electronic Health Records. J Allergy Clin Immunol. 2025;155(suppl 2):AB312. doi:10.1016/j.jaci.2024.12.960
  2. Clay B, Bergman HI, Salim S, Pergola G, Shalhoub J, Davies AH. Natural language processing techniques applied to the electronic health record in clinical research and practice – an introduction to methodologies. Comput Biol Med. 2025:188:109808. doi:10.1016/j. compbiomed.2025.109808

x