The evaluation and management of first-time seizure-like events in children pose a significant challenge in clinical practice. These events might signify epileptic seizures or mimic other conditions, making accurate diagnosis and prognosis critical. Recently, a study aimed to harness the power of machine learning models using real-world data to predict seizure recurrence following an initial seizure-like event.
This retrospective cohort study conducted a comprehensive analysis between 2010 and 2020, delving into electronic medical records (EMRs) at Boston Children's Hospital and leveraging de-identified patient-level administrative claims data from the IBM MarketScan research database. The study population encompassed patients diagnosed with epilepsy or convulsions before the age of 21, identified through International Classification of Diseases, Clinical Modification (ICD-CM) codes. The study compared the efficacy of various machine learning models, including logistic regression and XGBoost trained on structured data, alongside emerging natural language processing techniques employing large language models.
The primary cohort from Boston Children's Hospital comprised 14,021 patients who experienced an initial seizure-like event, while the comparison cohort from the IBM MarketScan database included 15,062 patients. The study found that seizure recurrence, defined by a composite expert-derived definition, occurred in 57% of patients at Boston Children's Hospital and 63% within IBM MarketScan.
Interestingly, large language models exhibited promising predictive abilities, especially when pre-trained with additional domain-specific and location-specific data from patients excluded from the study. These models showcased superior performance metrics with an F1-score of 0.826 and an AUC of 0.897. Even the base model without additional pre-training outperformed models trained solely on structured data.
In comparison, models trained with structured data using XGBoost outperformed logistic regression models. However, both logistic regression and XGBoost models trained on Boston Children's Hospital data performed similarly to those trained on the IBM MarketScan database, indicating robustness across different datasets.
The study's implications are significant. It highlights the wealth of predictive information embedded within physicians' clinical notes following initial seizure-like events. This suggests that clinicians capture and document critical data elements useful for predicting seizure recurrence in children.
However, while this study marks a pivotal step in leveraging machine learning for specialized clinical tasks, its retrospective nature and reliance on recorded data pose limitations. Future prospective studies involving real-time physician input at the onset of seizure-like events, followed by comprehensive follow-ups using a composite definition for seizure recurrence, could provide more conclusive insights.
Moreover, the success of fine-tuned large language models in specialized tasks suggests potential applications in clinical decision support systems. Nevertheless, the implementation of such models requires meticulous consideration of deployment strategies and comprehensive validation through implementation science.
Overall, this study underscores the potential of machine learning models, particularly large language models, in aiding clinical decision-making regarding seizure recurrence in children. It also hints at their utility in delving deeper into understanding seizures' etiology and adherence to clinical guidelines, paving the way for further advancements in pediatric neurological care.
Reference: Beaulieu-Jones, B. K., Villamar, M. F., Scordis, P., Bartmann, A. P., Ali, W., Wissel, B. D., ... & Kohane, I. (2023). Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study. The Lancet Digital Health, 5(12), e882-e894.
Comments