The Role of Alvarado Score in Predicting Acute Appendicitis and Its Severity in Correlation to Histopathology: A Retrospective Study in a Qatar Population

Background/objective Acute appendicitis (AA) is one of the most common surgical emergencies that require a proper diagnosis to avoid a negative outcome in the case of missed or delayed diagnosis. Our study aims to assess the diagnostic power of the Alvarado score and the prediction of the severity of acute appendicitis in correlation to intraoperative findings and the final histopathology (HP) result. Methods This retrospective study was applied to 1,303 patients with clinically proven acute appendicitis (AA) and available HP results. We correlated Alvarado score to the gold standard HP and intraoperative findings. We selected the cutoff point of Alvarado at 5 and 7 as they were the most frequent cutoff value mentioned in the literature and based on the ROC curve in this study to assess sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Results The mean age of the study cohort is 33.3 ± 9.5 years, with a male predominance (75.8%). The negative appendectomy (NA) rate was 4%. The operative complication rate was 1.2%, and we recorded one mortality case (0.1%). The diagnostic evidence of AA was in 95.9% of cases. Alvarado score ≥ 7 presented sensitivity and specificity of 66.4% and 69.8%, respectively, with PPV of 98.1% and NPV of 8.1%, with an accuracy of 66.5%. For Alvarado score ≥ 5, the sensitivity was 91.2%, specificity was 22.6%, PPV was 96.5%, NPV was 9.8%, and accuracy was 88.4%. In addition, we demonstrated statistical significance between Alvarado risk stratification with HP and intraoperative grades (p = 0.001 each). Conclusion The Alvarado scoring system alone is not enough to diagnose AA with unsatisfactory sensitivity and specificity. However, it is a good indicator of the severity of AA that we can depend on to prioritize those patients waiting for surgery.


Introduction
Acute appendicitis (AA) is caused by inflammation of the vermiform appendix with many hypotheses justifying this disease's etiology. One of the main etiologies is obstruction of the appendicular lumen, whether by stool, vegetable seeds, foreign body, or neoplasm, which increases luminal pressure of the appendix with the subsequent impaired vascular supply of the appendicular wall and hence inflammation and perforation [1]. Another hypothesis is going with racial and genetic distribution, while another study is referring to environmental and socioeconomic distribution [2,3]. Most emergency departments worldwide reported AA as the most common surgical emergency they faced in daily activities [4]. The risk of getting this disease during a person's lifetime is about 6%-9%, affecting the young population between the second and third decades of their lives [5,6]. Medical presentations range from abdominal pain to a life-threatening disease with peritonitis and septic shock. Therefore, early diagnosis of AA is the main factor that leads to the best outcome in managing this common disease [1,7]. The management can also be tailored according to the presentation from nonoperative with antibiotics to exploration laparotomy in delayed and neglected cases. Hence, its early diagnosis carries a significant challenge, especially with different presentations regarding symptoms and pain tolerance, which differ according to sex and race with a diversity of differential diagnoses, especially in females [8]. There are various ways to diagnose AA depending on clinical assessment, laboratory, and radiological results. Many scientists and medical schools invented diagnostic The radiological assessment of AA with ultrasound (US) or computed tomography (CT) scan was left for cases with a borderline Alvarado score or a patient with confusing clinical presentation according to gender and clinical suspicion [16,17]. Many centers all over the world consider CT scans as their main diagnostic tool to avoid false-positive cases and decrease the rate of negative appendectomies to less than 5% of operated cases but with the risk of radiation hazards and its remote lifelong complication and risks, as well as with its impact of delay of management and its potential cost [18][19][20].
We tried to assess the diagnostic power of the Alvarado score in the current era of advanced radiological scans and its easy availability with increasing emergency flow. Accordingly, we will know the validity of its application in our emergency and avoid the radiation hazards of routine CT scans. In addition, we will assess the rate of negative appendectomy (NA) in this study cohort.

Materials And Methods
This study is a retrospective analysis of prospectively collected data at Hamad General Hospital, Qatar's largest tertiary care facility in Doha. The study was conducted using the database from January 1, 2018, to January 31, 2019, and received approval from the Medical Research Center Institutional Review Board (IRB) of Hamad Medical Corporation, Doha, Qatar (protocol number MRC-01-19-454). The study was applied to 1,303 patients with clinically proven AA based on physician judgment of clinical signs and symptoms and confirmed by radiological scans. Our inclusion criteria are as follows: (1) patients ≥14 years old; (2) patients who were clinically diagnosed, admitted with AA, and underwent appendectomy; and (3) available postoperative histopathology results.
We used our electronic medical record (EMR) database Cerner to retrieve study data. The collected data in this study demonstrate three sets of preoperative and postoperative data. The first set included demographics, history, and clinical characteristics (age, gender, nationality, presenting symptoms of abdominal pain and duration, fever, anorexia, nausea, vomiting, change of bowel habits, smoking or alcohol consumption, BMI, and comorbidities such as diabetes mellitus (DM), hypertension (HTN), coronary artery disease (CAD), arterial fibrillation (AF), and chronic kidney disease (CKD)); signs of tenderness and rebound tenderness; vitals data (systolic blood pressure (SBP), diastolic blood pressure (DBP), pulse rate, body temperature, and oxygen (O2) saturation); and intensive care unit (ICU) admission and length of ICU stay. The second set of data demonstrated laboratory results (white blood cells (WBCs), neutrophil count, lymphocyte count, platelets, hemoglobin level (Hb), international normalization ratio (INR), creatinine, blood urea nitrogen (BUN), pH, base excess, serum C-reactive protein (CRP), serum lactate, serum albumin, and serum glucose) and the radiological findings of CT scan. Finally, the third set of data included surgical procedure details (evaluation to surgery time (EST), date/type of surgery, operative grading of appendicitis, conversion to open surgery, intraoperative complications, postoperative drain insertion, date of drain removal, postoperative antibiotics (AB), duration of AB course, postoperative imaging if there, reoperation, cause of reoperation (collection and bleeding), readmission and cause of readmission (PE, collection, stump appendicitis, fever, and abdominal pain) and death) and histopathology (HP) grading. The results of the study are presented in Table 1.  The Alvarado score was calculated according to the available data retrospectively after the surgical management of AA and the final HP results. The patients were stratified accordingly into three groups: group I with less probability of having AA with a score from 1 to 4, group II with intermediate probability with a score from 5 to 6, and group III with a high probability with a score above 7 to 10 ( Table 1) [9].
We used the American Association for the Surgery of Trauma (AAST) grading for AA [21] as the nearest grading description to our recorded findings. We classified the HP microscopic findings to grade 0 for normal or no evidence of AA, grade I for mild form of AA, grade II for gangrenous/perforated AA, and grade III for AA with incidental neoplastic finding. Regarding operative findings, it was described as grade 0 for appendix with a normal appearance, grade I for nonperforated AA, grade II for gangrenous/impending perforation AA, grade III for perforated AA with fluid collection, grade IV for mass forming AA, and grade V for finding mentioned before with generalized peritoneal contamination. It was considered that grades I and II are uncomplicated AA and that grades III, IV, and V were considered complicated AA. We correlated the Alvarado score to the gold standard HP findings and intraoperative findings to assess the relation of the Alvarado scoring system to the severity of the results mentioned above. We chose the cutoff point of the Alvarado score to be at 5 and 7 to assess the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPP).

Statistical methods
Descriptive statistics in the form of mean and standard deviation (SD) for interval variables and frequency with percentages for categorical variables were calculated according to the Alvarado and HP gradings. Chisquare tests were applied to see the association between HP and the Alvarado gradings. One-way ANOVAs were performed to find the mean differences among HP and Alvarado gradings for all interval variables. ROC curve and the concordance (c) statistic were used to assess the ability of the Alvarado score in the diagnosis of AA disease at different cutoff values. A p-value less than 0.05 (typically ≤ 0.05) was considered statistically significant. The Statistical Package for Social Sciences (SPSS) version 28.0 (IBM Corp., Armonk, NY, USA) was used for data analysis.

Results
A total of 1,303 patients who underwent appendectomy for the diagnosis of AA were enrolled in the study.
The mean age of the study cohort is 33.3 ± 9.5 years. The male-to-female ratio was 3.1:1 with a male predominance (75.8%). Regarding nationality, most of the study cases were from Asia (80.9%), followed by Africa (18.1%) and other continents (1%). The mean time between the onset of acute appendicitis symptoms to the diagnosis was 1.8 ± 1.5 days and the time from diagnosis of AA to surgery was 26.6 ± 11 hours. The most frequent presentations of factors establishing the Alvarado score were tenderness of the right iliac fossa (99.3%), rebound tenderness (79.9%), and an increase in leukocytes to >10,000 per µL (76.9%). The mean Alvarado score for the whole cohort was 7.0 ± 1.7, and this number is affected by the risk stratification of the whole cohort, which has been divided into three groups: less probability (group I) with 121 patients, intermediate probability (group II) with 336 patients, and high probability (group III) with 846 patients.
Regarding laboratory findings, as demonstrated in Table 2 and Table 3, we recorded the mean serum CRP concentration for the cohort of 49 ± 69 mg/L, the mean WBC count was 13.9 ± 4.3 × 10 3 /uL, 43.8% of patients had a WBC count between 10,000 and 15,000 × 10 3 /uL, and 33.2% of the patients had WBCs > 15,000 × 10 3 /uL. Laparoscopic appendectomy was the main surgical procedure, representing 89.9% of the cases, while open appendectomy represented 10.1%. The conversion rate from a laparoscopic approach to an open approach was 0.6%. Fifty-three (4.1%) patients required intraoperative drain insertion. The mean days for the perioperative antibiotic requirement was 6.0 ± 4.0 days. The intraoperative findings recorded during surgery were as follows: 1.5% for grade 0, 76.9% for grade I, 14% for grade II, 0.4% for grade III, and 7.2% for grade IV. Fifty (3.8%) patients required postoperative imaging due to postoperative-related or non-related clinical findings. Reoperation was encountered in three (0.2%) patients; one patient was operated on for drainage and washout of abdominal collection, and the other two patients were operated on to control postoperative bleeding. The readmission rate was 1.5% (20 patients). One patient was admitted with pulmonary embolism, 14 patients were diagnosed with abdominal collection, one patient was diagnosed with stump appendicitis, one patient had a fever, and three patients had significant abdominal pain without definite pathology. The operative complication rate was 1.2%, the mean length of hospital stay was two days, and we recorded one mortality case (0.1%). Regarding HP findings, the diagnostic evidence of AA was in 95.9% of cases. There were 52 patients with a normal appendix, with no evidence of inflammation divided according to gender with 25 females and 27 males. Hence, the negative appendicectomy rate was 4%. CT scan was done for 1,095 (84%) patients, ultrasound scan (US) was recorded in 217 patients (16.7%), and 136 (10.4%) patients obtained both studies prior to surgery. The HP findings were grade 0 in 52 patients, grade I in 1,198, grade II in 41, and grade III for AA resulting from neoplastic findings in 11 cases. All details are demonstrated in Table 2 and Table 3.     For the evaluation of the Alvarado score, computing the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with the optimal cutoff value for the Alvarado score, we found a statistical significance between the Alvarado score and the diagnostic confirmation using a cutoff score greater than or equal to 5 and 7 (p = 0.003 and p≥0.001, respectively), showing a greater chance of AA diagnosis for such results; this was according to the ROC curve with an area under the curve of 0.696, which showed a cutoff greater than 5.5 as being the most significant ( Figure 1).

FIGURE 1: ROC curve.
Diagonal segments are produced by ties.
Scores equal to or higher than 7 presented sensitivity and specificity of 66.4% and 69.8%, respectively, with a PPV of 98.1%, an NPV of 8.1%, and an accuracy of 66.5%. For scores greater than or equal to 5, the sensitivity was 91.2%, specificity was 22.6%, PPV was 96.5%, NPV was 9.8%, and accuracy was 88.4%.

Correlation between the Alvarado score stratification and the study variables
Of the 1,303 patients, 121 (9.3%) patients belonged to group I with a low probability Alvarado score of AA, which includes 89 (73.6%) males; 336 (25.8%) patients were under group II with 250 (76.7%) males; and 846 (64.9%) patients belonged to category III consisting of 649 (76.7%) males, which meant that, according to the Alvarado score, the disease is more common in male and presented more with group III. However, this observation may not reflect the truth as Qatar's demography has high male proportions (75%) due to the influx of mainly male laborers [22]. There is statistical significance in vital data between Alvarado groups toward pulse rate and body temperature (p = 0.003 and p = 0.001, respectively). For the Alvarado score components, we found a statistical significance among the groups toward migratory abdominal pain, anorexia, nausea, and vomiting (p = 0.001 each). Regarding laboratory results, WBC count showed significant variability between the three groups (p = 0.001), with a higher level in high probability group III, the same with neutrophil count (p = 0.001). With lymphocyte count, a higher level is toward group I in comparison with group II and group III (p = 0.001). There was a significant finding for WBC brackets; WBCs > 15,000 × 10 3 /uL demonstrate more percentage with group III (p = 0.001), but as regards WBC count between 10,000 and 15,000, we found a higher rate with group I, followed by group II, and then group III (p = 0.001). Serum C-reactive protein (CRP), lactate, and glucose levels showed statistical significance with an increase in the probability of AA (p = 0.03, p = 0.001, and p = 0.001, respectively). According to CT scan findings, the appendicular diameter of the inflamed appendix demonstrated significance in group III more than in the other groups (p = 0.001). Regarding the surgical procedure offered, we found a significance between Alvarado risk groups, in which a more open appendectomy approach was used in the high-risk group III than in groups I and II (p = 0.003). The rest of the variables were nonsignificant ( Table 2 and Table 3).

Correlation between the histopathological findings and the study variables
Out of the 1,303 patients in this study, grade 0 HP findings were found in 52 patients consisting of 51.9% male, grade I in 1,198 patients including 76.8% male, grade II in 41 cases with 83.3% male, and grade III in 11 cases with 63.6% male, demonstrating more male gender with the more severe form of appendicitis, which is statistically significant (p = 0.001). As mentioned earlier, this observation may not be accurate due to the predominance of the male gender in Qatar. In correlation with vital signs, pulse rates, raised temperature, and oxygen saturation in the blood have significant correlations (p = 0.006, p = 0.001, and p = 0.002, respectively). The duration of symptoms was more with the less severe noncomplicated HP finding (p = 0.001). Requirements for postoperative antibiotics and imaging, drain, and length of hospital stay (LOS) were significant with the complicated AA grade II more than other grades (p = 0.001 each).
The laboratory findings showed raising in WBCs and neutrophil count, which were significantly associated with high-grade HP of appendicitis than lower grades of inflammation (p = 0.001 each). Conversely, lymphocyte counts were lower in high grades than in lower grades (p = 0.001). The presence of an elevated CRP, INR, and lactate level was significantly associated with HP grade severity of appendicitis (p = 0.001, p = 0.001, and p = 0.002, respectively). Regarding Alvarado score components, migratory abdominal pain, raised body temperature, and abnormal WBCs were significant components between HP grades with a lower percentage in grade 0 representing no evidence of AA. Concerning CT scan findings, appendicular lumen diameter demonstrated statistical significance with HP disease severity (p = 0.001). The mean Alvarado score for patients with appendicitis was 7.1 ± 1.7, 7.6 ±1.5, and 6.6 ± 2.2 for grades I, II, and III of HP, respectively, while it was 5.8 ± 1.7 for those with normal histological findings (grade 0) (p = 0.001); this indicates that the higher the Alvarado score, the more severity the AA on histopathology examination. With regard to the association of HP with different components of the Alvarado score, migratory right iliac fossa pain (p = 0.01), abnormal WBC count (p = 0.001), and raised temperature (p = 0.001) were significantly correlated with HP grades. The rest of the compared variables were neither statistically nor clinically significant with HP grades ( Table 2 and Table 3).

Correlation of the HP findings/operative findings with Alvarado risk stratification groups
We demonstrated statistical significance between Alvarado risk groups and HP grades (p = 0.001). It showed that about 52 patients with negative evidence of AA were distributed according to the Alvarado groups, more in group I than in groups II and III (9.9%, 7.4%, and 1.8%, respectively). This shows that Alvarado risk stratification is highly significant with HP finding (gold standard) in diagnosing acute appendicitis. This means a higher Alvarado score is related to lower negative appendectomy rates. Also, the higher the Alvarado score, the higher the HP grades and the more severe the inflammation is ( Table 4). Regarding operative findings, we demonstrated a significant relationship between it and Alvarado risk groups (p = 0.001); a more severe operative finding is associated with the high Alvarado score risk group (

Discussion
AA is one of the most common surgical diseases in the emergency department, and most of the time, it mimics and simulates other intra-abdominal surgical and medical pathologies. As it is an acute disease that carries a risk of complications and may be life-threatening if left without a prompt diagnosis, surgeons offer a surgical option rather than observation when the diagnosis of AA is suspicious to avoid such risks [7,8].
Negative appendicectomies carry additional challenges and burdens to the surgeons, patients, and healthcare facilities with non-indicated procedures. An accurate diagnosis of AA is required to avoid such conflicts. Therefore, many diagnostic scores were raised to help in diagnosis and to decrease negative appendectomy rates [20]. Alvarado score was one of the earliest score systems introduced to diagnose AA, and it managed to decrease non-indicated CT scans that carry radiation hazards in addition to its cost and availability.
This study included 1,303 patients operated on with a diagnosis of AA with available postoperative HP reports. We retrospectively studied the available data aiming to validate Alvarado score in correlation to gold standard HP findings. Across our sample, AA was found more in males (75.8%) than in females in Qatar's population, which is similar to many reports in the literature [23,24]. On the other hand, other studies showed equal distribution between males and females [25,26]. In the view that AA affects the young population between the second and fourth decades, we found that our study cohort matches this with a mean age of 33.3 years, while a recently published article showed a relatively younger age group than this study that showed a mean age less than 30 years [17,27].
The negative appendectomy (NA) rate on HP results was 4% in this study, which is considered favorable in comparison to the literature rate [28][29][30][31]. In addition, we found that the female and male gender (51.9% and 58.1%, respectively) are equally in obtaining NA in this study, which is entirely different from other studies that confirmed that the NA rate was more prevalent in the female gender [32][33][34]. The most frequent Alvarado score components were tenderness of the right iliac fossa (99.3%), followed by rebound tenderness and leukocytosis, which are comparable to other studies [35,36]. However, it showed differences between the study of Swami et al. [37], who reported a lower incidence of leukocytosis, and Rodrigues et al. [38], where an elevated temperature was the most predominant. In our view, those differences in frequency are related to the patient's status at the time of examinations and the differences in the cohort population and gender.
Most of the study cohort scored above Alvarado score of 5 (91%), and about 65% of the study population obtained a score of 7. Other studies showed a similar finding that most of the study population presented with a score above 5 [33,36,39,40]. We examined the sensitivity and specificity of the Alvarado score at the cutoff point of 5 and 7. We found that sensitivity, specificity, PPV, and NPV were 91. It was noticed that with the increase in the cutoff point, the sensitivity decreased, but the specificity increased. Another study displayed Alvarado score validation at cutoff value ≥ 7 and stated lower sensitivity of Alvarado score of 54%, specificity of 75%, PPV of 90%, and NPP of 29%, which concluded that the Alvarado score was not a helpful method in the diagnosis of AA [41].
We find in this study a significant correlation between HP and Alvarado risk stratification groups (p = 0.001) ( Table 4). The highest percentage of different grades of HP were found mainly in group III of Alvarado; also, we noticed that false-positive appendicitis (negative appendectomy on HP) is higher in the lower Alvarado groups than in the higher groups, which is similar to other studies [36,[42][43][44]. In terms of operative findings, we noticed a significant correlation with Alvarado, more percentage of AA along with the increase in Alvarado risk groups, especially with advanced grades of operative findings. Other studies also noticed this, which confirms the association between operative findings and Alvarado [35]. Other studies did not find such a correlation with Alvarado ( Table 5) [45].
The mean time of evolution until the patients underwent appendectomy was 26.6 ± 11 hours, which is better than the study done by Sousa-Rodrigues et al., which reported a mean evaluation time of 32.4 ± 5.4 hours [45]. In this study, there is no statistical significance between Alvarado groups and the mean time of evaluation (p = 0.15), a similar finding found in the literature [36,46], but other studies showed a significant relationship between increased complication rate and longer evaluation time [42]. Our rate of complications was 1.2%, which is acceptable, and those complications mainly happened in relation to suppurative appendicitis in final HP, as displayed by other studies [36,47].
This study has limitations. Retrospective medical record reviews have inherent limitations in terms of data quality. Most of the study cohort obtained radiological assessment during emergency evaluation, which affected the accuracy of the negative appendectomy rate. The operating surgeons are different in degree and experience, which may affect recorded intraoperative findings. Future research could benefit from exploring such limitations. Nevertheless, the study has many strengths. A larger sample size gives a more accurate relation between variables. To our knowledge, it is the first to appraise the correlation between the Alvarado score and HP findings in Qatar, employing a wide range of interlacing variables in such a large sample.

Conclusions
Our study results showed unsatisfactory sensitivity and specificity of the Alvarado score, and we conclude that the Alvarado score is not the sole valid tool to be used alone in diagnosing AA with many differential diagnoses, especially in females. However, the Alvarado score showed to be a good severity indicator of appendicitis to depend on while prioritizing patients waiting for surgery.

Additional Information Disclosures
Human subjects: Consent was obtained or waived by all participants in this study. The Medical Research Center of Hamad Medical Corporation issued approval MRC-01-19-454. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.