Phase II Quantitative Validation of AI Platforms for Clinical Evidence Retrieval: Gold-Standard Comparison and Scoring Rubric Analysis of LDRT Trials in Osteoarthritis

Aishwarya Kalluri; Kristal De La Cruz Quezada; Nadiya A. Persaud; Justin Rineer; Tomas Dvorak

Poster
Author & Poster Info

Phase II Quantitative Validation of AI Platforms for Clinical Evidence Retrieval: Gold-Standard Comparison and Scoring Rubric Analysis of LDRT Trials in Osteoarthritis

Abstract

Background

Artificial intelligence (AI) platforms such as ChatGPT, Claude, Gemini, and Perplexity are increasingly used by patients and clinicians to interpret treatment-related information. However, their reliability in synthesizing complex clinical evidence remains uncertain. Low-dose radiotherapy (LDRT) has emerged as a therapeutic option for osteoarthritis, offering anti-inflammatory effects and pain relief, making it an ideal model for evaluating AI-driven evidence synthesis.

Methods

A multi-phase validation study was conducted. In Phase 1, a gold-standard dataset was created from randomized controlled trials and prospective studies on LDRT for osteoarthritis, with manual extraction of study design, interventions, comparators, outcomes, and limitations. In Phase 2 (in progress), leading AI platforms (ChatGPT-4 Turbo, Claude Sonnet 4.0, Gemini Flash 2.5, Perplexity AI, and Meta AI) are benchmarked against this dataset using a structured scoring rubric converted into JSON fields to assess accuracy, completeness, and consistency. Phase 3 (planned) will evaluate reproducibility and hallucination rates through repeated outputs and natural language processing-based reliability analysis.

Results

Preliminary findings indicate variability in AI platform performance when extracting and synthesizing LDRT clinical evidence. While some platforms demonstrated partial alignment with reference data, common limitations included incomplete study identification, inconsistencies in reported outcomes, and occasional hallucinations. These discrepancies were more pronounced in interpreting heterogeneous trial designs and comparative effectiveness data.

Conclusion

AI platforms demonstrate variable reliability in summarizing clinical evidence for LDRT in osteoarthritis. Although promising for improving accessibility of complex data, current limitations in accuracy and completeness necessitate careful human oversight before clinical or research application.

Poster

non-peer-reviewed

Phase II Quantitative Validation of AI Platforms for Clinical Evidence Retrieval: Gold-Standard Comparison and Scoring Rubric Analysis of LDRT Trials in Osteoarthritis

Author Information

Aishwarya Kalluri

Research, Orlando College of Osteopathic Medicine, Winter Garden, USA

Kristal De La Cruz Quezada

Research, Orlando College of Osteopathic Medicine, Winter Garden, USA

Nadiya A. Persaud Corresponding Author

College of Public Health, University of South Florida, Tampa, USA

Justin Rineer

Oncology, Orlando Health, Orlando, USA

Tomas Dvorak

Oncology, Orlando Health, Orlando, USA

Poster Information

Meeting

Second Annual OCOM Research Sympoisum April 03, 2026 - April 03, 2026

Publication history

Published: March 28, 2026

Copyright

© Copyright 2026
Kalluri et al. This is an open access poster distributed under the terms of the Creative Commons Attribution License CC-BY 4.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

License

This is an open access poster distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PDF

Learn more

Learn more

Learn more

Ongoing Competitions

Phase II Quantitative Validation of AI Platforms for Clinical Evidence Retrieval: Gold-Standard Comparison and Scoring Rubric Analysis of LDRT Trials in Osteoarthritis

Abstract

Related articles

Phase II Quantitative Validation of AI Platforms for Clinical Evidence Retrieval: Gold-Standard Comparison and Scoring Rubric Analysis of LDRT Trials in Osteoarthritis

Author Information

Aishwarya Kalluri

Kristal De La Cruz Quezada

Nadiya A. Persaud Corresponding Author

Justin Rineer

Tomas Dvorak

Poster Information

Meeting

Publication history

Copyright

License

Download Cureus Media Kit