Artificial Intelligence-Based Application Provides Accurate Medical Triage Advice When Compared to Consensus Decisions of Healthcare Providers

Accurate medical triage is essential for improving patient outcomes and efficient healthcare delivery. Patients increasingly rely on artificial intelligence (AI)-based applications to access healthcare information, including medical triage advice. We assessed the accuracy of triage decisions provided by an AI-based application. We presented 50 clinical vignettes to the AI-based application, seven emergency medicine providers, and five internal medicine physicians. We compared the triage decisions of the AI-based application to those of the individual providers as well as their consensus decisions. When compared to the human clinicians’ consensus triage decisions, the AI-based application performed equal or better than individual human clinicians.


Introduction
Timely and accurate triage of various medical conditions is vital to improving patient outcomes and the efficiency of healthcare delivery systems [1,2]. Inaccurate triage may lead to process delays and low-value care. As an example, up to two-thirds of emergency department visits are unnecessary and avoidable, resulting in $32 billion in excess national healthcare expenditure [3].
Artificial intelligence (AI) has tremendous potential to improve healthcare, including medical triage optimization [4][5][6]. Machine learning algorithms utilized in the emergency department have already been shown to better predict clinical outcomes than conventional methods [7]. However, a significant proportion of triage occurs even prior to the emergency room visit and is determined by the patient. Patients currently utilize search engines, chatbots, and AI-based personal health assistant applications to access medical information that may aid in initial triage decisions, underscoring the importance of these services' accuracy.
Prior research indicates that online symptom checkers often provide inaccurate or incomplete diagnostic or triage advice for users [8,9]. One audit study found that appropriate triage advice was given only in 57% of standardized patient evaluations [10]. More recent research demonstrated that an AI-based virtual assistant appropriately triaged 90% of clinical vignettes presented to it [11]. However, this determination of accuracy was made by one human clinician, significantly limiting the strength of this finding.
We sought to measure the accuracy of triage decisions provided by another AI-based application, MayaMD, by directly comparing its performance to physicians and physician assistants of multiple medical specialties practicing in various clinical settings.

Materials And Methods
MayaMD is an AI-based application that patients may utilize to help determine where they should seek care for any medical condition. Using this application, patients input symptoms and answer subsequent questions, and are then provided with likely diagnoses as well as whether to continue with self-care or seek primary, urgent, or emergency care services.
MayaMD uses a combination of Bayesian statistics and pattern recognition. Layers of supervised and unsupervised machine learning sit on top of a core algorithm to recognize new patterns, typically resulting from changing geographic or demographic data. MayaMD's library and core algorithm are built on accepted evidence-based clinical knowledge and include over 7,000 diagnoses, 8,500 initial inputs (symptoms, physical signs, and labs), 40,000 inferences, and 2,200 medications and interactions. Similar AI-based applications have also been previously described [11][12][13].
MayaMD guides the user to answer various questions that ultimately serve as inputs for the algorithm to recommend optimal triage. The individual inputs symptoms and their quality, duration, and alleviating and exacerbating factors. MayaMD also asks the user about various risk factors associated with the input symptoms. For example, if the user states they are having chest pain, MayaMD will ask if the user has ever smoked or had a blot clot in the legs. Figure 1 demonstrates an example of a user interacting with the MayaMD application.

FIGURE 1: Example of a user interacting with MayaMD.
The MayaMD algorithm utilizes these inputs to determine a differential diagnosis that also guides the optimal triage decision. Any life-threatening condition (based on symptoms, risk factors, and differential diagnosis) is triaged to the emergency room. Any case that requires immediate intervention, such as antibiotic or nebulized breathing treatment, without pre-determined "red flags" that may indicate a lifethreatening condition is triaged to urgent care. Chronic symptoms that MayaMD does not associate with any life-threatening differential diagnoses are triaged to outpatient services.
We sought to measure the accuracy of triage decisions provided by MayaMD. We compared triage decisions made by the AI-based application with those made by healthcare providers for various medical conditions. We simulated patients presenting with various medical conditions by developing 50 unique clinical vignettes, similar to prior studies [9,11]. The vignettes were selected from a list of the most common clinical presentations to the emergency room, urgent care, and primary care services. Vignettes included the patient's age, sex, medical history, and presenting symptoms with a variable amount of pertinent positive and negative symptoms. Vignettes were not associated with a definite diagnosis or expected standard of care. Vignettes were not associated with an available physical examination. An example of one of the vignettes utilized in the study is as follows: "80-year-old man with a history of coronary artery disease presented with shortness of breath for 2-3 weeks. He complains of waking up in the middle of the night, gasping for air. He also feels uncomfortable sleeping flat, without pillows. All the symptoms started a few weeks ago." The full list of clinical vignettes is presented as supplementary material (see Appendices).
Providers were recruited in three phases. In the first phase, the Department of Bioinformatics at the University of Utah assisted in recruiting emergency room (ER) providers who participated in the study. In all three phases, providers were asked to read each vignette and then choose one of the following options for each vignette: (1) "Go to ER or call 911 (life-threatening injuries or symptoms that need treatment immediately)"; (2) "Go to urgent care within 24 hours (non-life-threatening, but need treatment the same day)"; (3) "Go to primary care physician (PCP) within three days (not immediately life-threatening that can wait three days before being seen by a primary care physician or specialist)"; or (4) "Self-care, remain at home, and only report to primary care or urgent care if the condition worsens." MayaMD provides one of these four triage decisions for patients based on the information they input into the application.
In the first phase of the study, we compared triage decisions between an ER physician, ER physician assistant, and MayaMD for 50 unique clinical vignettes. We recorded MayaMD and each provider's triage decision for each vignette.
In the second phase of the study, we selected only the vignettes from the first phase of the study in which the two providers and MayaMD did not provide the same triage decision. These vignettes were then presented to six ER physicians practicing at the University of Michigan St. Joseph Mercy Hospital. Again, we recorded triage decisions by each physician. All six physicians were then asked to discuss each vignette and develop a consensus triage decision for each clinical vignette.
In the third phase of the study, we presented all 50 clinical vignettes to five internal medicine physicians practicing hospitalist medicine at UCLA Health. Each physician provided a triage decision, and then the group discussed the cases to develop a consensus triage decision for each vignette. In the second and third phases of the study, individual provider's and MayaMD's triage decisions were then compared with consensus triage decisions.

Results
In the first phase of the study, we compared triage decisions between an ER physician, ER physician assistant, and MayaMD for 50 clinical vignettes. The ER physician and MayaMD agreed on triage decisions in 37 (74%) cases, while the ER physician assistant and MayaMD agreed on 30 (60%) cases. There was significant variability in the ER physician and ER physician assistant's responses, who agreed on triage decisions for only 28 (56%) cases. The ER physician, ER physician assistant, and MayaMD all agreed on triage decisions in only 24 cases (48%).
For the second phase of the study, we selected 26 vignettes from the first phase of the study that did not result in a unanimous triage decision and presented these vignettes to six other ER physicians practicing at a different health system. The six physicians agreed on the same triage decision for nine (35%) cases. The physicians were then asked to come to a consensus triage decision for all 26 cases, hereby referred to as "Consensus A." MayaMD made the same triage decision as the physicians in Consensus A for 22 (85%) cases. The individual ER physicians had the same triage decision as the decision in Consensus A for an average of 21 (82%) cases, with a range from 18 to 24 (69% to 92%) cases. Only one of the physicians matched Consensus A decisions more times than MayaMD matched Consensus A decisions.
Utilizing Consensus A decisions and the 24 unanimous decisions from phase one of the study, we developed a second consensus of triage decisions, hereby referred to as "Consensus B." When compared to Consensus B triage decisions, the ER physician from phase one of the study, ER physician assistant from phase one of the study, and MayaMD made the same decision for 39 (78%), 30 (60%), and 46 (92%) clinical vignettes, respectively.
In the third phase of the study, we again presented all 50 clinical vignettes to six other providers, and then asked them to reach a consensus for each vignette, hereby referred to as "Consensus C." On average, an individual physician's initial triage decisions matched Consensus C decisions for 40 cases (80%), with a range from 31 to 45 (62% to 90%) cases. MayaMD came to the same triage decision as the Consensus C decision for 44 (88%) cases.
The triage decisions of MayaMD and Consensuses A, B, and C for all 50 vignettes are presented in Table 1

Discussion
Our results demonstrate that an AI-based application, MayaMD, performs at par or better than individual clinicians when determining a triage decision for a clinical vignette ( Figure 2). In our study, the AI-based application's triage decisions accurately matched the triage decisions of two consensuses of six practicing human physicians. The AI-based application's decisions were consistent with the consensus decisions at a similar rate to the individual human physicians' decisions consistency with the consensus decisions. Among the 50 clinical vignettes, there was a total of nine cases in which the AI-based application came to a different decision than at least one of the consensus groups. There were four cases (16,25,41,48) where MayaMD had a different decision than Consensus B and six cases (7, 26, 35, 37, 42, 48) where MayaMD had a different decision than Consensus C. In six of these cases (7, 25, 26, 35, 42, 48), the discrepancy was due to triage to urgent versus emergency care services. There were similar differences in the decisions of individual providers in these cases, demonstrating that either triage may be appropriate. This discrepancy may also be due to individual biases based on practice patterns at various urgent care facilities, as there is substantial variability in what medical conditions are managed at urgent care facilities throughout the United States [14]. For Case 16, a 45-year-old man with a history of heavy smoking presenting with leg pain, the AI-based application recommended primary care services, whereas Consensus B recommended emergency care services. Consensus B likely came to this decision to account for the differential diagnosis of critical limb ischemia; however, based on the other details of the case, both the AI-based application and Consensus C recommended primary care services. Similar discrepancies were found for Cases 37 and 41. Based on these cases, it is clear that minimizing variability in the triage decision relies on detailed input from the user to help narrow the differential diagnosis associated with the user's symptoms, especially to reduce the likelihood of diagnoses that are potentially life-threatening. Interestingly, there was only one case (48) in which both consensus groups came to a different decision than the AI-based application, and in this case, the discrepancy was between urgent versus emergency care services. Other than this one case, the AI-based application's triage decision was consistent with at least one of the consensus groups' decisions, demonstrating the validity of the AI-based application as a medical triage tool.
Our study builds upon previous research that has demonstrated that AI-based applications can determine appropriate diagnoses and appropriately triage for various clinical presentations. While earlier research on the accuracy and safety of various symptom checkers provided mixed and unfavorable results [8][9][10], more recent research examining AI-based applications for this purpose has been more positive. For example, Barriga et al. demonstrated an AI-based application was able to accurately predict discharge diagnosis from patient's input of symptoms upon arriving at the emergency room [15]. Another study indicated that an AIbased virtual assistant appropriately triaged 90% of clinical vignettes presented to it, and Gilbert et al. demonstrated that three available AI-based applications provided safe triage advice when compared to general practitioners for 200 clinical vignettes [11,16].
Our study is the first to our knowledge to compare AI triage decisions with those of a consensus of a group of providers from multiple specialties working at various practice locations, including both academic and non-academic settings. Our findings are significantly strengthened by this utilization of a consensus group of diverse, actively practicing clinicians. Our study is also the first to our knowledge that demonstrates the ability of AI to discriminate between ER and urgent care triage, as previous studies only assessed the AIbased application's ability to determine urgent versus non-urgent triage. Appropriate triage discrimination between urgent and emergency care services has significant implications for cost savings and efficient healthcare delivery for health insurers and health systems.
Throughout the various phases of the study, there was high variability among the individual providers' decisions, similar to previous research [12]. In our study, ER providers generally agreed on clinical presentations that required emergency care but had more variability in their triage decisions for presentations that were more appropriate for urgent or outpatient care. Among the internal medicine providers, the majority of the variability in their triage decisions stemmed from select providers ascribing to a generally more conservative triage approach. The variability in triage decisions between the physician assistant and the physicians was largely because the physician assistant did not develop the same rigorous differential diagnoses as the physicians. Such variability among individual providers may reflect biases and cognitive errors, mimic current real-life medical practice, and demonstrate the potential benefit of relying on AI for triage decisions. The variability in decisions made by individuals also demonstrates the limitation in using one human individual to determine an AI-based application's decision accuracy, as done in previous research [11]. A strength of the current study is the utilization of a consensus decision from multiple physicians to compare and determine the accuracy of the AI-based application's decisions.
Our findings have significant implications. The utilization of AI-based applications that improve the appropriateness and safety of medical triage has the potential to improve patient outcomes and experience as well as the efficiency of healthcare delivery. Payors, providers, and patients may benefit from cost savings and higher-value care. AI-based applications may also be able to provide triage assistance in more rural or underserved areas where access to traditional triage nursing services may be limited.
This study has limitations. We utilized clinical vignettes that were developed by a team of clinicians; therefore, limiting the real-world implications of our study. Real patients' interaction with the AI-based application may not generate the same triage decision as for a clinical vignette simulating the same presentation of symptoms. Future research in which the AI application faces real patients is needed to ensure the accurate triage decisions seen in the present study translate to a real-world setting.
Further research with well-designed randomized trials should also evaluate the potential real-world outcomes of the utilization of AI-based triage decision support. Outcomes such as emergency department visit rates, patient morbidity and mortality, and costs may be compared for groups of patients with and without access to AI-based applications.

Conclusions
Our study demonstrates that an AI-based application may provide accurate medical triage decision support for patients. The utilization of AI-based triage decision support may potentially improve patient outcomes and reduce healthcare costs, and these are claims that still need to be evaluated with future research.

Appendices
The full list of clinical vignettes is listed below.
1. An 80-year-old man with a history of coronary artery disease presented with shortness of breath for two to three weeks. He complains of waking up in the middle of the night, gasping for air. He also feels uncomfortable sleeping flat, without pillows. All the symptoms started a few weeks ago.

2.
A 28-year-old female with a mouth ulcer for three days. No other symptoms and no risk factors for oral cancer.
3. A 70-year-old woman with dysuria and urinary frequency for three days. She also had fevers at home for two days.

4.
A 40-year-old man with weight loss and generalized itching for two months. On further questioning, he noted to have night sweats for two months as well.

5.
A 30-year-old female presents with dysmenorrhea and menorrhagia for six to seven months. She also complains of feeling quite tired for about three months. 6. A 42-year-old female with hand pain and stiffness for five months. She denied any other symptoms. 7. A 28-year-old woman with a history of asthma presented with worsening of shortness of breath for a week. She had tried rescue inhalers, without much improvement. She says she has been coughing a lot for the last five days. No fever or chills.

8.
A 60-year-old man who underwent hip surgery recently came with blood in phlegm for two days. He also feels shortness of breath for the last three days. His one leg appears to be a bit more swollen than the other for three days as well. 9. A 20-year-old man with sneezing and cough for two days. No fever, neck pain, or sputum production. 10. A 68-year-old man with the difficulty of swallowing for five months. He feels that the food is getting stuck at the level of the stomach and the symptom is worse with solid food. He also complained of weight loss for four to five months.

11.
A 30-year-old woman presents with joint pain and a rash on the face for two months. The rash appears on both cheeks and on top of the nose. She feels very tired as well for the same duration.

12.
A 35-year-old man complains of gaining weight and constipation for nine months. He also said yes to feeling tired for the last several months. 13. A 24-year-old girl who was prescribed a new pill for acne presented with shortness of breath, lip and tongue swelling, and wheezing for a few hours.
14. A 72-year-old man with fever and chills for five days. He has had a productive cough and fatigue for a week. He has a history of myocardial infarction (MI) with stents placed a few years ago.

44.
A 29-year-old man with a history of migraine presented with severe disabling headache and photophobia for three days. The episode feels like his migraine attack. Denied fever or neck pain.

45.
A 78-year-old woman with jaundice and fatigue for four months. She also complained of an abdominal lump and weight loss for three months.
46. An 84-year-old woman presented with rectal bleeding for four days. She has abdominal pain, located in the right lower region and low-grade fever at home for one week. She denied any other symptoms.

47.
A 38-year-old man with palpitations and chronic diarrhea for four months. On further questioning, he said yes to weight loss for four months as well.

48.
A 76-year-old woman with a history of smoking presented with a worsening of wheezing for four days. She has a history of chronic obstructive pulmonary disease and gets admitted with a change in weather.

49.
A 30-year-old man with diarrhea for six months. It is mixed with mucus at times. He also complained of joint pain in multiple joints for four months.

50.
A 46-year-old man with a history of uncontrolled blood presented with sudden onset of severe chest pain for two hours. Upon questioning, he felt a tearing sensation in his chest. He is very uncomfortable at rest.
The following table presents the decisions of individual providers for all 50 vignettes.