The development of artificial intelligence (AI) in the medical field has revolutionized the way data is collected and analyzed. A recent study conducted by a multidisciplinary team at UT Southwestern Medical Center has showcased the potential of an AI-enabled pipeline that can efficiently extract critical information from complex, free-text medical records.
Published in the prestigious journal npj Digital Medicine, the study highlights the team’s innovative approach to creating analysis-ready data for research studies in a fraction of the time traditionally required. David Hein, M.S., the study’s first author and Data Scientist in the Lyda Hill Department of Bioinformatics at UT Southwestern, emphasized the significance of this breakthrough.
Hein stated, “Constructing highly detailed, accurate datasets from free-text medical records is extremely time-consuming, often requiring extensive manual chart review. Our study demonstrates one approach for creating AI-powered large language models (LLMs) that simplify the process of collecting and organizing medical data for analysis. By automating both data extraction and standardization through AI, we can make large-scale clinical research more efficient.”
The researchers utilized an AI-powered LLM to analyze over 2,200 kidney cancer pathology reports, evaluating the model’s ability to identify and categorize different types of tumors. Through collaboration with AI scientists, pathologists, clinicians, and statisticians, the team refined the workflow through multiple rounds of testing, enhancing its capability to handle complex medical information. The accuracy of the results was validated against existing electronic medical record (EMR) data to ensure reliability, with impressive outcomes of 99% accuracy in identifying tumor types and 97% accuracy in detecting metastasis.
Study co-leader Payal Kapur, M.D., Professor of Pathology and Urology, highlighted the challenge of training AI to extract data from narrative reports due to the variety of terms clinicians use to describe findings. However, with proper input and oversight, an AI model can efficiently review and categorize vast amounts of records with speed and accuracy.
The collaborative effort extended to testing the pipeline across a broader dataset of over 3,500 internal kidney cancer pathology reports, yielding consistent results. James Brugarolas, M.D., Ph.D., Director of the Kidney Cancer Program at UT Southwestern, emphasized the importance of teamwork in refining AI instructions for accuracy.
While the study focused on kidney cancer, the researchers believe that the approach could have broader applications across different tumor types. Andrew Jamieson, Ph.D., Assistant Professor and Principal Investigator in the Lyda Hill Department of Bioinformatics, expressed excitement about the potential for AI-powered LLMs to enhance medical research in various specialties.
The study, titled “Iterative refinement and goal articulation to optimize large language models for clinical information extraction,” provides a roadmap for researchers to leverage AI technology effectively in their respective fields. This collaborative effort underscores the transformative impact of AI in streamlining data extraction and analysis in the medical research landscape.