Relative United States Medical Licensing Examination (USMLE) Performance by Specialty Is Not a Predictor of Board Exam Pass Rate: The Case of Diagnostic Radiology

Introduction In 2010 diagnostic radiology (DR) changed the board certification process for residents using the new Core exam. However, there is not a standardized way to evaluate DR residency graduates. With no specific target pass rate for the exam, the “appropriate” pass rate has remained a debated topic among the field. In this paper, the board certification exam passage rates of DR are compared to other medical specialties to assess the standardization method of the American Board of Radiology (ABR) and serve as basis for additional specialties considering changes to their board exam structure. Methods Performance on the United States Medical Licensing Examination (USMLE) was obtained from the National Resident Matching Program (NRMP) and San Francisco match. Boards passage rates were analyzed using data from the American Board of Medical Specialties. USMLE and board exam passage rates were averaged and ranked, and statistical analysis was conducted using Stata (College Station, TX). Results DR performance on USMLE Step 1 has increased at the lowest rate (0.563 points/year) since 2005 and anesthesiology performance has increased at the greatest rate (1.313 points/year). Residents matching from US allopathic medical schools during the 2010 and 2012 years had DR oral board exams with USMLE 1 averages of 232 and 235, respectively. First-time pass rate for the first Core exam was 87% and the overall pass rate since the first Core exam has been 88.54%. The Spearman rho coefficient for specialty ranks of board passage rate and USMLE 1 was 0.0679 (p = 0.8101). The Spearman rho coefficient for board passage rate and USMLE 2 CK was 0.1430 (p = 0.6257). The Spearman rho coefficient for USMLE 1 and USMLE 2 CK was 0.8317 (p = 0.0002). Conclusions Specialty board pass rates have not increased in concert with improved trainee performance on the USMLE. USMLE performance among those matching in diagnostic radiology has increased, ABR board exam passage rate has decreased. ABR determines passing thresholds to the relative performance of examinees rather than using a criterion referenced Angoff standard.


Introduction
Diagnostic radiology (DR) has had competitive applicants over the past 10 years [1], and the United States Medical Licensing Examination (USMLE) Step 1 average among matched US graduates held steady at 241 in 2020 [2]. With that said, the fill rate has fluctuated from 99% in 2009 [3] to 92% in 2012 [4] followed by an upward trend to 99% in 2018 [5].
Following a 2006 nationwide survey of practicing radiologists, the American Board of Radiology (ABR) revamped the board certification process for residents beginning radiology residency training in 2010 [6]. The new Core exam would be standardized using the Angoff method whereby subspecialty experts would determine the minimum competency for each section [7]. As such, no specific target pass rate for the exam was advertised. It would be therefore theoretically possible for all or none of the residents taking the exam to pass. Questions were raised as to how ABR experts would determine this competency level for residents with greater than a quarter of residency training (residency training is four years) remaining in contrast to the former oral exam given at the end of training, and some worried that smaller programs would be at a disadvantage in the new system [8].
Due to the increase in competition for DR residency slots in 2009, the first group of residents to take the then-new ABR Core exam in 2013 were among the highest achieving cohort of residents that DR had ever trained. Nonetheless, the ABR Core exam pass rate mirrored the pass rates of prior ABR board certification examinations [9]. Despite the promise of the ABR that this new exam would not place smaller programs at a disadvantage [7], chief resident-derived data obtained after the first two administrations of the Core exam suggested that small program size was indeed a risk factor for failing the Core exam [10].
The "appropriate" pass rate has remained a debated topic among trainees and program officials. This paper seeks to compare the board certification exam passage rates of DR to other medical specialties to assess the standardization method of the ABR and serve as basis for additional specialties considering changes to their board exam structure.
USMLE and board exam passage rates were averaged and ranked. This is shown in Table 4.

Discussion
As specialty boards seek to respond to advances in testing technology, greater exam preparation resources, and the need improved standardization, they are faced with challenges if they change the format of timetested examination methods. Much attention has been paid to changes in the Initial Certification process in Diagnostic Radiology within the greater climate of ACGME and ABMS evolution, as it switched from an oral examination to the computerized Core and Certifying exams starting with the graduating class of 2014.
The first query in any transition focuses on whether a novel test or process retains the same content validity as in the prior setting and whether criterion validity has been sacrificed for simplicity. In describing the new board examination process in 2013 [7], the ABR answered the rumor that "10% of Board examinees must fail exams" by explaining that a panel of experts determines the level of competency commensurate with safe practice regardless of how many examinees fail as a result. The oral system, in contrast, utilized a panel of experts who assessed the candidate on a face-to-face basis. The evidence basis for why the new system is more effective than the former system remains a source of debate. In fact, the 2014/2015 program directors' survey revealed that "91% felt that the ABR Oral Examination was superior to the Core Examination in testing readiness for clinical practice" [13].
Several conclusions can be drawn from these data. First, while there is no "standard" acceptable fail rate in any specialty, the 10% suggested by Becker et al. is not far from reality [7]. Second, there is no correlation between rates of board exam pass rate and USMLE I performance even though there are clear disparities between USMLE I performance in each specialty. If one were to take USMLE I as a relative measure of one's ability to perform well on medical multiple choice exams among a pool of exceptional learners, this suggests that each specialty sets a standard within the pool of physicians that they have, not the total population of medical graduates. Third, as performance on the medical licensing exams has shown an upward trend, board passage rates have not followed in concert. Relevant to this discussion is the fact that the USMLE has not changed its scoring system as examinees improve. USMLE score inflation is a clear example of the effect of exceptional test takers availing themselves of improved exam resources over time, and the overall question of how generalizable the application of standard psychometric testing procedures will continue to apply to examinees of remarkable intellect in an era of ever-expanding resource material will persist.
If the core values of a board certifying body include public trust, it may be reasonable to admit that not all takers should pass lest the credibility of the board be at risk. A system in which a 100 percent pass rate is typical would suggest that the responsibility to verify the acumen of the specialty's practitioners would lie with the training programs and not the board. This would be counter to the board's mission. The fact that the first Core exam fail rate mirrored the average pass rate across all specialties and the rates of prior ABR exams adds to the perception of validity. Regardless, using any psychometric process to exclude items in an attempt to discriminate between those who pass and those who fail assumes that there will be candidates who fail.
The usage of recalled examination items by takers of the prior clinical exam was discussed by Berlin in 2012 [14] and Ruchman et al. in 2008 [15]. Though the practice of sharing ABR examination content is now more clearly forbidden, and the current Core exam has reportedly reduced the number of reused items, a valid concern regarding reliability is raised. If the reuse of exam items improves the ability to equate prior administrations with current administrations as described by the ABR, it becomes impossible to know whether a given examinee knows the correct answer to a reused question because of comprehensive preparation or because he or she was told the item would be tested. If a passer is defined as one who performs well on validated discriminatory reused questions, an obvious bias emerges in favor of the utilizer of contraband recalled items.
The decision to force examinees to take the Core exam at the end of PGY-4 was likely related to the intent to deemphasize any possible "recall" advantage generated by examinees who violate ABR policy, but it also begs the question as to whether there is a benefit to forcing a resident to take this high-stakes exam before he or she feels adequately prepared. Self-reported data from the first two administrations of the Core exam [10] suggested that easing of clinical duties near the exam was a negative predictor of success. It seems reasonable that a program may desire to hold select residents back several months if it seems as though clinical experience may be insufficient.
As for the autumn administration of the Core exam, the candidate pool as it is now must almost entirely consist of alternate certification applicants -many of whom have completed a residency outside of the United States in addition to multiple fellowships -and PGY-5 residents who have failed the exam at least once. From an onlooker's perspective, it seems nearly impossible to compare the results of such a sample to the traditional candidate pool taking the exam in the spring administration. Allowing first-time traditional examinees into this pool may improve the ability to ensure that the exam is uniform between both administrations.

Conclusions
Specialty board pass rates have not increased in concert with improved trainee performance on the USMLE. Specialty ranks according to USMLE 1 and USMLE 2 CK are statistically similar, however, neither USMLE 1 nor USMLE 2 CK ranks correlate with board passage rate. While USMLE performance among those matching in diagnostic radiology has increased, ABR board exam passage rate has declined. The data presented here suggests that the ABR determines passing thresholds to the relative performance of examinees rather than using a criterion referenced Angoff standard.

Additional Information Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.