Analysis of Milestone-based End-of-rotation Evaluations for Ten Residents Completing a Three-year Anesthesiology Residency

Introduction Faculty are required to assess the development of residents using educational milestones. This descriptive study examined the end-of-rotation milestone-based evaluations of anesthesiology residents by rotation faculty directors. The goals were to measure: (1) how many of the 25 Accreditation Council for Graduate Medical Education (ACGME) anesthesiology subcompetency milestones were included in each of the residency’s rotations evaluations, (2) the percentage of evaluations sent to the rotation director that were actually completed by the director, (3) the length of time between the end of the residents' rotations and completion of the evaluations, (4) the frequency of straight line scoring, defined as the resident receiving the same milestone level score for all subcompetencies on the evaluation, and (5) how often a resident received a score below a Level 4 in at least one subcompetency in the three months prior to graduating. Methods In 2013, the directors for each the 24 anesthesia rotations in the Stanford University School of Medicine Anesthesiology Residency Program created new milestone-based evaluations to be used at the end of rotations to evaluate residents. The directors selected the subcompetencies from the list released by the ACGME that were most appropriate for their rotation. End-of-rotation evaluations for the post-graduate year (PGY)-2 to PGY-4 from July 1, 2014 to June 30, 2017 were retrospectively analyzed for a sample of 10 residents randomly selected from 22 residents in the graduating class. Results The mean number of subcompetencies evaluated by each of the 24 rotations in the residency equaled 17.88 (standard deviation (SD): 3.39, range 10–24, median 18.5) from the available possible total of 25 subcompetencies. Three subcompetencies (medical knowledge, communication with patients and families, and coordination of patient care within the healthcare system) were included in the evaluation instruments of all 24 rotations. The three least frequently listed subcompetencies were: “acute, chronic, and cancer-related pain consultation/management” (25% of rotations had this on the end-of-rotation evaluation), “triage and management of critically ill patient in non-operative setting” (33%), and “education of patient, families, students, residents, and others” (38%). Overall, 418 end-of-rotation evaluations were issued and 341 (82%) completed, with 63% completed within one month, 22% between month one and two, and 15% after two months. The frequency of straight line scoring varied, from never occurring (0%) in three rotations to always occurring (100%) in two rotations, with an overall average of 51% (SD: 33%). Sixty-one percent of straight line scoring corresponded to the residents’ postgraduate year whereby, for example, a post-graduate year two resident received an ACGME Level 2 proficiency for all subcompetencies. Thirty-one percent of the straight line scoring was higher than the resident’s year of training (e.g., a PGY-2 received Level 3 or higher for all the subcompetencies). The remaining 7% of straight line scoring was below the expected level for the year of training. Three of seven residents had at least one subcompetency rated as below a Level 4 on one of the evaluations during the three months prior to finishing residency. Conclusion Formal analysis of a residency program’s end-of-rotation milestone evaluations may uncover opportunities to improve competency-based evaluations.


Introduction
The Accreditation Council for Graduate Medical Education (ACGME) introduced the Next Accreditation System in 2013 [1]. This outcomes-based model of residency relies on trainees showing competency in a variety of skills, attitudes, and knowledge to track their progress with the ultimate goal of a smooth transition to independent practice. Assessments in graduate medical education rely heavily on workplacebased observations by faculty [2,3].
The Next Accreditation System incorporated the six ACGME core competencies that apply to all graduate medical education: Patient Care, Medical Knowledge, Professionalism, Interpersonal and Communications Skills, Practice-based Learning and Improvement, and Systems-based Practice. Each specialty then independently created a specific set of subcompetencies for each of the six core competencies that would be used by that specialty's residency programs to develop curriculum and assessment.
For anesthesiology, a total of 25 subcompetencies exist, ranging from one subcompetency for the Medical Knowledge core competency to 10 subcompetencies for the Patient Care competency [4]. For example, the Interpersonal and Communications Skills core competency for anesthesiology has three subcompetencies: communication with patients and families, communication with other professionals, and team and leadership skills. Residents are meant to be evaluated based on their performance in the individual subcompetencies.
Each subcompetency then has a set of milestones, from less to more advanced, which define the target behaviors for performance. The milestones are arranged into numbered proficiency levels from Level 1 whereby the resident "demonstrates milestones expected of a resident who has completed one postgraduate residency year" to Level 5 whereby the resident "has advanced beyond performance targets defined for residency, and is demonstrating "aspirational" goals such as the performance of someone who has already been in independent practice for several years. Only a few exceptional residents should reach this Level 5 for any of the subcompetencies." Anesthesiology milestones, as assessed by the faculty, have a positive linear relationship with post-graduate year level [5].
This study aimed to examine the end-of-rotation milestone-based evaluations of residents as completed by faculty for academic years 2014-2017. The motivation for the study was to better understand the assessment data being collected to make improvements in the evaluation of residents. More specifically, some residents reported they could gain more insight about their professional development if the end-of-rotation evaluations had more useful feedback information. Also, the department's Clinical Competence Committee uses the end-of-rotation evaluation data to make decisions on milestone levels submitted by the residency to the ACGME via the online Accreditation Data System account.
The goals were to measure: (1) how many of the 25 ACGME anesthesiology subcompetencies were included in each of the residency's rotations evaluations, (2) the percentage of evaluations sent to the rotation director that were actually completed by the director, (3) the length of time between the end of the resident's rotation and completion of the evaluation (because delays may be associated with a decreased ability to detect differences in resident performance [6]), (4) the frequency of straight line scoring, defined as the resident receiving the same milestone level score for all subcompetencies on the evaluation, and (5) how often did a resident received a score below a Level 4 in at least one subcompetency in the three months prior to graduating.

Materials And Methods
After approval from the Stanford University Humans Subjects Committee, data from end-of-rotation milestone-based evaluations for post-graduate year (PGY) 2/Clinical Anesthesia 1 to PGY-4/Clinical Anesthesia 3 from July 1, 2014 to June 30, 2017 were retrospectively analyzed for a sample of 10 randomly selected (via a random number generator) from the 22 graduating residents.
After the ACGME published the 25 anesthesiology subcompetencies in 2013, and after all the rotation directors in the Stanford Anesthesiology residency received an hour-long, small group education session with the residency program director on the rationale for the milestones, each rotation director created new end-of-rotation evaluation instruments by selecting the subcompetencies that were most appropriate for their rotation.
These newly created evaluations were launched for all rotations in the residency on July 1, 2014.
After the resident completed a rotation, the end-of-rotation evaluation form was sent to the faculty rotation director via the web-based Residency Management System (MedHub, Minneapolis, MN). If the evaluation was not completed within a month, a reminder was sent, followed by another reminder a month later. There was no specific repercussion to the faculty for delay or failure to submit.
Each subcompetency of the end-of-rotation evaluation was rated on a 10-point milestone scale, from 0.5 (not yet achieved Level 1) 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5 or 5, with the 0.5 indicating performance between the level below and the level above.
Rotation evaluation forms included space for general qualitative comments, but these were excluded from this study.

Results
Data from the 10 randomly selected residents were analyzed with a total of 418 end-of-rotation milestonebased evaluations issued between 7/1/2014 and 6/30/2017. Three of the residents had medical leaves of absence during residency which extended the finish date of the residency past July 1, 2018 and affected the number of available rotation evaluation data.
1. How many of the 25 anesthesiology subcompetencies were included in each of the end-of-rotation evaluations?
Three subcompetencies (medical knowledge, communication with patients and families, and coordination of patient care within the health-care system) were assessed in all 24 rotations ( Table 1).

Name of Rotation
families, students, residents, and others  pain consultation/management" (25% of rotations had this listed on the end-of-rotation evaluation form of the resident), "triage and management of critically ill patient in non-operative setting" (33%), and "education of patient, families, students, residents, and others" (33%).

Systems-based Practice
2. The percentage of evaluations sent to the rotation director that were actually completed by the director.
Overall, 82% of end-of-rotation evaluations issued to faculty were completed ( Table 2).   Of the 341 milestone-based evaluations submitted for the 10 residents, 63% were completed within one month, 22% between month one and two, and 15% after two months ( Table 3).   4. How often was the resident assigned the same milestone level for all subcompetencies listed on the endof-rotation evaluation?

Percentage of Evaluations
Sixty-one percent of all evaluations had the same level score for all the subcompetencies on the end-ofrotation evaluation ( Table 4).

Resident Percentage of Evaluations with the Same Level
Score for All Subcompetencies  The frequency of straight line scoring varied by rotation, from 0% in three rotations to 100% in two rotations with an overall mean average of 52% (SD: 33%) ( Table 5).  Sixty-two percent of SLS corresponded to the resident's postgraduate year, 31% of the time the SLS was higher than the resident's year of training, and the remaining 7% of the time was below the expected level for the year of training.

Percentage of Evaluations with
5. How commonly did a senior resident receive a score below a Level 4 in at least one subcompetency in the three months before graduating?
Three of the seven residents (the other three residents finished after the end of the study period due to leaves) had at least one subcompetency rated as below a Level 4 on one of the rotation evaluations during the three months prior to the completion of the residency program (  *These residents finished residency after the end of the end period due to leaves of absence.

Discussion
Little is known about how anesthesiology residents are evaluated using milestones during rotations in a residency. This study found that faculty directors of 24 different clinical rotations at a large anesthesiology residency choose to include an average of 18 of the 25 available ACGME defined subcompetencies in the end-of-rotation evaluations. This indicates that most competencies can be evaluated several times during residency.
Whereas, three subcompetencies (medical knowledge, communication with patients and families, and coordination of patient care within the healthcare system) were assessed in all 24 rotations, "acute, chronic, and cancer-related pain consultation/management", "triage and management of critically ill patient in nonoperative setting", and "education of patient, families, students, residents, and others" were evaluated in less than a third of the rotations. Different rotations have different resident duties and activities thereby defining what can be assessed during any rotation. For example, it may not be feasible to evaluate "acute, chronic, and cancer-related pain consultation/management" during some rotations.
The frequency of straight line scoring whereby the same score on the 10-point milestone scale is given to the resident in all of the subcompetencies varied by rotation, with an overall average of 51% (SD: 33%). This ranged from 0% in three different rotations (no resident on that rotation ever received straight line scoring) to 100% (every resident on that rotation received straight line scoring) in two rotations. When straight line scoring did occur, 61% corresponded directly to the resident's postgraduate year. The milestone-based endof-rotation evaluations may not be measuring the knowledge or ability possessed by the particular resident, but rather indicate where along the training sequence a resident falls.
In contrast, the results of this analysis show that 32% of the straight line scoring was higher than the resident's year of training, and the remaining 7% of the time was below the expected level for the year of training.
Although milestone-based evaluations assume that each subcompetency is assessed independently with individualized assessments for each resident, a national study of milestones submitted to the ACGME by Family Medicine Residencies also found that residents are being rated in a stable manner as they progress through residency [7]. A separate study of all Emergency Medicine programs found that approximately 5% of the more than 6,000 residents had straight line scoring as reported to the ACGME. The data analyzed in both studies are different from this study because they analyzed data submitted to the ACGME every six months, instead of the individual end-of-rotation evaluations as was performed in this study.
In general, there are two domains for threats to the validity of assessments. The first is construct underrepresentation whereby there are not enough observations by the attending supervising resident. The second threat to validity is construct-irrelevant variance often primarily due to systematic rater error. Although raters are the major source of measurement error for these types of observational assessments, construct-irrelevant variance is associated with systematic rater error, such as rater severity or leniency error, central tendency error (rating in the center of the rating scale) and restriction of range (failure to use all the points on the rating scale, not discriminating among competencies). Straight line scoring occurs when the rater ignores the traits to be rated and treats all traits as if they were one. Thus, ratings tend to be repetitious and inflate estimates of reliability. Better training of faculty tasked with the assessments may help to reduce some undesirable rater effects [8].
Rotation directors typically work directly with a resident, and also obtain input from the other faculty when completing the end-of-rotation evaluations. However, straight line scoring indicates that individualized assessment may not be occurring. This could be due to several reasons including lack of faculty expertise in assessment, lack of valid and reliable methods for assessment of specific milestones, and lack of resources to actually complete the assessment due to lack of protected time in busy clinical rotations. Overcoming these barriers is important to optimize graduate medical education. In addition, the ACGME Review Committees plan to examine milestone performance data for each program's residents as one element in the Next Accreditation System.
Using the results of this study to make enhancements to improve the resident evaluation process yields several potential interventions. One possibility is to reduce the number of subcompetencies expected to be evaluated in each rotation from the current average of 18 to a lower more manageable realistic number that is best addressed by the nature of that specific rotation. This would ideally allow the rotation director to focus on properly evaluating a few subcompetencies, but at the trade-off of fewer repeat evaluations of the subcompetency as the resident progresses. Reducing the number of evaluations that have straight line scoring would indicate more robust resident assessment.
Another mechanism to improve the evaluation process is to have faculty receive more extensive instructions on the assessment of residents, the meaning of the subcompetencies, and the scaling of the milestones. For this study, training of rotation directors only occurred once but based on these results training will occur annually. Ideally, multiple assessments (e.g., test scores, conference attendance, direct observations) are combined to inform decisions on different milestones [9]. There is turnover among the faculty so new rotation directors need to be educated and mentored. This study also found a delay in that 37% of evaluations were entered into the system for the resident to see more than one month after rotation end. This likely reduces the impact on the resident to be able to modify any behavior. As a result, it may be worthwhile to examine different types of information to provide evaluators (e.g., a real-time dashboard with summary data on their timeliness of evaluations submitted as well as the frequency of straight line scoring) that might help change behavior and improve the information residents receive on the evaluations.
Another consideration anecdotally brought up by faculty and residents is to change the evaluation scale so that the milestone level does not directly correspond with the post-graduate year. It may be too easy to fall in the mindset of postgraduate year and milestone level being tied as the same regardless of residents' abilities in the specific subcompetencies. Since faculty may be thinking of the resident's postgraduate year when completing the evaluation, they are in fact confirming that the resident is at appropriate level.
Multiple studies do exist comparing faculty-based milestone evaluations with resident's self-assessments on those same milestones [10][11][12]. For example, in comparing 20 general surgery residents' self-evaluations with the Clinical Competence Committee evaluation of their performance, most self-assessments were within half a level of the committee's report [10]. In a study of all US Internal Medicine Residents, milestone-based rating by the faculty correlated with the prior non-developmentally based (unsatisfactory, satisfactory, superior) resident evaluation rating scale [13].
A few of the residents on some of the rotations had subcompetencies below a Level 4 within the last three months of training. A national study of all United States Internal Medicine residents, similarly found a fraction of graduating residents who had competency ratings below Level 4 [14]. In a Pediatric Residency, variation in milestone scores decreased over time and by graduation, almost all residents scored did not have a Level 4 score or greater in all subcompetencies [15].
Several limitations to this study exist because descriptive research focuses on details or characteristics of a population (our small sample size) without making inferences beyond those who are studied [16]. For example, there may be confounding issues related to rotation evaluation scores such as faculty being reluctant to assign low scores to residents because of fear that the resident may retaliate by judging the faculty poorly on the resident evaluation of the attending. Also, raters may not assess a resident as performing poorly for not wanting to delay advancement of the resident or fear of the repercussions of adding extra residency time [17], and the risk of other rater biases including gender bias [18,19]. These dimensions and further investigation as why straight line scoring is occurring which could be done qualitatively such as by interviewing the faculty rotation directors was not undertaken in this study.
Further studies can examine what interventions can better provide meaningful data for Clinical Competency Committees, support self-directed assessments, facilitate resident feedback, and allow specialty Review Committees to monitor and help programs improve.

Conclusions
The milestones provide a framework for the assessment of the development of the resident from novice to expert. Descriptive analysis of a residency program's end-of-rotation milestone evaluations may yield data to target improvement efforts so that the competency-based evaluation can attain its intended goals.