Benchmarking the Reliability of AI Platforms in Synthesizing Prospective and Retrospective Evidence on Brachytherapy for Primary Vaginal Cancer


Abstract

Background

Artificial intelligence (AI) platforms such as ChatGPT, Claude, Gemini, and Perplexity are increasingly used to interpret and summarize oncologic evidence. However, their reliability in synthesizing trial-level data for rare malignancies remains uncertain. Primary vaginal cancer, characterized by limited prospective studies and reliance on retrospective data, provides a valuable model to evaluate AI performance.

Methods

A multi-phase study was conducted. First, a gold-standard dataset was developed from prospective and retrospective studies on brachytherapy for primary vaginal cancer, with manual extraction of study design, interventions, outcomes, and limitations. AI platforms (ChatGPT-4 Turbo, Claude Sonnet 4.0, Perplexity, and Meta AI) were then evaluated using a structured scoring rubric assessing accuracy, completeness, and consistency. Future analysis will assess reproducibility and hallucination rates using repeated outputs and natural language processing–based methods.

Results

AI platform performance varied, with measurable differences in composite scores across models. All platforms demonstrated incomplete identification of relevant studies and frequent hallucinations when synthesizing evidence. Limitations were more pronounced when summarizing heterogeneous retrospective datasets, highlighting challenges in accurately integrating fragmented clinical evidence.

Conclusion

AI platforms demonstrate variable and currently limited reliability in synthesizing oncologic evidence for rare cancers. While they offer potential for improving accessibility of complex data, significant concerns regarding accuracy and completeness persist. Human oversight remains essential before integrating AI-generated outputs into clinical decision-making or research workflows.

Poster
non-peer-reviewed

Benchmarking the Reliability of AI Platforms in Synthesizing Prospective and Retrospective Evidence on Brachytherapy for Primary Vaginal Cancer


Author Information

Kristal De La Cruz Quezada

Research, Orlando College of Osteopathic Medicine, Winter Garden, USA

Aishwarya Kalluri

Research, Orlando College of Osteopathic Medicine, Winter Garden, USA

Nadiya A. Persaud Corresponding Author

College of Public Health, University of South Florida, Tampa, USA

Justin Rineer

Oncology, Orlando Health, Orlando, USA

Tomas Dvorak

Oncology, Orlando Health, Orlando, USA


PDF Share