The evaluation of educational programs has become an expected part of medical education. At some point, all medical educators will need to critically evaluate the programs that they deliver. However, the evaluation of educational programs requires a very different skillset than teaching. In this article, we aim to identify and summarize key papers that would be helpful for faculty members interested in exploring program evaluation.
In November of 2016, the 2015-2016 Academic life in emergency medicine (ALiEM) Faculty Incubator program highlighted key papers in a discussion of program evaluation. This list of papers was augmented with suggestions by guest experts and by an open call on Twitter. This resulted in a list of 30 papers on program evaluation. Our authorship group then engaged in a process akin to a Delphi study to build consensus on the most important papers about program evaluation for medical education faculty.
We present our group’s top five most highly rated papers on program evaluation. We also summarize these papers with respect to their relevance to junior medical education faculty members and faculty developers.
Program evaluation is challenging. The described papers will be informative for junior faculty members as they aim to design literature-informed evaluations for their educational programs.
Introduction & Background
Medical educators spend much of their time developing and delivering educational programs. Programs can include didactic lectures, online modules, boot camps, and simulation sessions. Program evaluation is essential to determine the value of the teaching that is provided [1-2], whether or not it meets its intended objectives and how it should be improved or modified in the future . However, rather than beginning at a program's conception , evaluation is often only considered late in the process or after the curriculum has been delivered .
Program evaluation can be mistaken for assessment or research, but these constructs are subtly different. Within medical education, assessment is generally understood to be the measurement of individual student performance . While student success can provide some information on the effectiveness of a program, program evaluation goes further to determine whether the program worked and how it can be improved . Program evaluation often overlaps and shares methods with research, but its primary goal is to improve or judge the evaluated program, rather than to create and disseminate new knowledge .
In 2016, the Faculty Incubator was created by the Academic Life in Emergency Medicine (ALiEM) team to create a virtual community of practice (CoP) [5-6] for early career educators. In this online forum, members of this CoP discussed and debated topics relevant to modern emergency medicine (EM) clinician educators. As part of this program, we created a one-month module focused on program evaluation.
This paper is a narrative review, which highlights the literature that was felt to be the most important for faculty developers and junior educators who wish to learn more about program evaluation.
During November 1-30, 2016, the junior faculty educators and mentors of the ALiEM Faculty Incubator  discussed the topic of program evaluation in an online discussion forum. The Faculty Incubator involved 30 junior faculty members and 10 mentors. All junior faculty members were required to participate in the discussion which was facilitated by the mentors, however, participation was not strictly monitored. The titles of papers that were cited, shared, and recommended were compiled into a list.
This list was expanded using two other methods: articles recommended during a YouTube Live discussion featuring mentors with significant experience in program evaluation (Dr. Lalena Yarris, George Mejicano, Chad Kessler, and Megan Boysen-Osborn) and a call for important program evaluation papers on Twitter. We ‘tweeted’ requests to have participants of the free open access meducation and medical education (#FOAMed and #MedEd) online virtual communities of practice  provide suggestions for important papers on the topic of program evaluation. Figure 1 demonstrates an exemplary tweet. Several papers were suggested via more than one modality.
The importance of these papers for program evaluation was evaluated through a three-round voting process inspired by the Delphi methodology [9-11]. All of this manuscript's authors read the 30 articles and participated in this process. In the first round, raters were asked to indicate the importance of each article on a seven-point Likert scale, anchored at one by the statement "unimportant for junior faculty" and at seven by the statement "essential for junior faculty." In the second round, rates were provided with a frequency histogram displaying how each article had been rated in the first round. They were then asked to indicate if each article "must be included in the top papers" or "should not be included in the top papers." In the third round, rates were provided with the results of the second round as a percentage of raters who indicated that each article must be included. They were then asked to select the five papers which should be included in the article because they are the most important.
Similar methods were used by the ALiEM faculty incubator in a previous series of papers published in the Western Journal of Emergency Medicine and Population Health [12-15]. Readers will note that this was not a traditional Delphi methodology  because our rates included novices (i.e. junior faculty members, participants in the faculty incubator) as well as experienced medical educators (i.e. clinician educators, all of whom have published > 10 peer-reviewed publications, who serve as mentors and facilitators of the ALiEM faculty incubator). Rather than only including experts, we intentionally involved junior educators to ensure we selected papers that would be of use to a spectrum of educators throughout their careers.
The ALiEM faculty incubator discussions, expert recommendations, and social media requests yielded 30 articles. The paper evaluation process resulted in a rank-order listing of these papers in order of perceived relevance as indicated by the results of round three. The top five papers are expanded upon below. The ratings of all 30 papers and their full citations are listed in (Table 1).
The following is the list of papers that our group has determined to be of interest and relevance to junior faculty members and faculty development officers. The accompanying commentaries are meant to explain the relevance of these papers to junior faculty members and also highlight considerations for senior faculty members when using these works for faculty development workshops or sessions.
1. The Association for Medical Education in Europe (AMEE) Education Guide no. 29 Evaluating Educational Programmes : This education guide within medical teacher begins with a brief discussion of the history of program evaluation. It goes on to recommend a framework of evaluation for educators that focuses on the methodology of evaluation, the context of evaluation practice, and the challenge of modifying existing programs with the results the evaluation. This overview includes detailed sets of questions for evaluators to ask about programs that they review. Perhaps the most salient piece of advice from this paper is that improvement even when modesty is valuable.
Relevance to Junior Faculty Member
This is a high-yield read for the junior faculty educator because it provides a succinct and comprehensive overview of program evaluation through the presentation of a framework, which can be adapted by junior faculty educators. Each step within the framework is accompanied by an explanation to assist the reader in understanding the components.
Considerations for Faculty Developers
Faculty developers should be expected to understand program evaluation in the context of its history. This manuscript summarizes the historical program evaluation literature from within and beyond medical education in a way that contextualizes modern controversies and informs current approaches. Faculty developers should use this manuscript to center themselves within the literature. The framework provided may also guide their approach to evaluating the programs of their more junior faculty members.
2. Program Evaluation Models and Related Theories- AMEE Guide No. 67 : This guide discusses the three main education theories that underlie various evaluation models (i.e. reductionist theory, system theory, and complexity theory). It begins by describing the purpose of program evaluation, clarifying the definition of program evaluation, and explaining why we evaluate educational programs. The authors conclude that the main purpose of any educational program is change – be it intended or unintended – and defines program evaluation as the “systematic collection and analysis of information related to the design, implementation, and outcomes of a program for the purpose of monitoring and improving the quality and effectiveness of the program.” The guide ends with a description of four evaluation models (i.e. experimental / quasi-experimental models, Kirkpatrick’s four-level model, logic models, and (context/ input/ process/ product model) informed by these education theories.
Relevance to Junior Faculty Members
Change is the most important aspect of any educational program, so measuring change should be the focus of a program evaluation. It is important for junior educators to understand that evaluation should analyze both the intended and unintended change resulting from a program, rather than solely investigating the intended outcomes. By discussing several different evaluation models and their underlying educational theories, this guide will allow the junior faculty educators to choose the best evaluation modality that is most relevant to their individual educational activity.
Considerations for Faculty Developers
This paper may enhance a faculty developer’s foundational knowledge of program evaluation by summarizing its underlying education theories and common models. It may also serve as a frequent reference for faculty developers as they select conceptual frameworks to inform the evaluation of educational programs.
3. Twelve Tips For Evaluating Educational Programs : The tips provided in this article can be summarized into three primary themes. Prior to beginning the evaluation, it is important to understand the program, be realistic in what is possible, define the stakeholders, determine the intended outcomes of the program, select an evaluation paradigm, and choose a measurement modality. As evaluation design begins, assemble a group of collaborators who will help to brainstorm, guide the methods used and assist in the piloting of the evaluation. Finally, they recommend avoiding common pitfalls such as confusing program evaluation with learner assessment, evaluating an outcome that is not consistent with the program’s goals, using an unreliable instrument or an instrument without context-specific validity evidence and having unrealistic expectations.
Relevance to Junior Faculty Members
Planning for program evaluation must take place as part of the program design process and not as an afterthought. The 12 tips provide salient advice and a model that is thorough, yet easily achievable for junior faculty educators. While the format of this paper presents only an overview of several complex concepts (e.g. validity evidence), the author provides references for a more in-depth review of these topics.
Considerations for Faculty Developers
Faculty developers will find this concise and clear paper, helpful as both a reference for mentees and to further their own understanding of program evaluation. In addition to foundational tips, the author summarizes advanced concepts that may apply to a faculty developer’s educational practice. Rather than simply presenting a formula for program evaluation, the inclusion of the strengths and weaknesses of various paradigms allows a more nuanced understanding of the gray areas in evaluating educational programs. Referencing the complexities of validity evidence and the potential drawbacks of a patient-related outcome approaches may spark dialogue in faculty development programs and collaborations. Finally, the references included are thoughtful and relevant and would be good additions to faculty developers’ personal libraries.
4. Rethinking Programme Evaluation in Health Professions Education-Beyond 'Did it Work?' : This article begins with a provocative analysis of Kirkpatrick hierarchy, establishing the multiple problems that arise when evaluation programs focus solely on outcomes. Beyond the outcome ("Did it work?"), it reinforces the importance of considering the educational theory ("Why will it work?"), the process ("How did it work?"), the context ("What context is the program operating in?”), and unexpected results within the evaluation of a program. In doing so, the authors open the discussion regarding which evaluation approaches might be better suited for different educational programs. More important than finding "the perfect" evaluation model is gaining a holistic view of a program that clarifies the relationship between interventions and their outcomes.
Relevance to Junior Faculty Members
The spirit of this paper is laudable: do not aim to find a single explanation or theory, but familiarize yourself with the literature and determine the best way to evaluate a program within your own context. It will guide junior faculty in their efforts to develop new educational programs within their educational contexts; focusing not only on if a certain program works, but on why it should work, how it worked, and what else occurred. These questions will guide implementation processes and inform future approaches.
Considerations for Faculty Developers
Providing a historical and theoretical overview of program evaluation as a discipline, this article traces the roots of program evaluation. It highlights the importance of going beyond the Kirkpatrick hierarchy to develop a greater understanding of why a program might succeed or fail. The first figure clearly outlines essential elements that explain how theory intersects with implementation and evaluation and is a must read for those who are training program evaluator to their faculty members, to guide them towards richer methods for describing curricula or programs in their scholarly work. Notably, this advice was considered controversial and should be carefully considered .
5. Perspective: Reconsidering the Focus on "Outcomes Research" in Medical Education- a Cautionary Note : There is an increasing emphasis on higher-level outcomes (e.g. patient outcomes) in educational research which presents challenges to researchers. After discussing the limitations of this approach, the authors offer salient advice for educational research: begin with a study question and proceed in a stepwise fashion to determine the intended outcome and measurement tool, rather than beginning with the measurement tool and working backward. They recommend beginning with Kirkpatrick level one outcomes (e.g. reaction) and sequentially progressing to higher levels (e.g. learning, behavior, and results)  throughout a program of research, rather than always striving to find an impact on patient-level outcomes.
Relevance to Junior Faculty Members
There are several challenges and pitfalls associated with developing medical education studies and evaluating patient-level outcomes. While patient-level outcomes will have a role as educational research continues to evolve, they can be difficult to fund without large grants as multi-site involvement is required to obtain adequate power. Lower level outcomes, such as student learning or behavior, remain important for assessments of novel interventions, as well as for isolating the most effective components of an intervention. This is important advice for junior faculty members who are already influenced by the focus on patient-level outcomes within medical research.
Considerations for Faculty Developers
Faculty developers must acknowledge the problems inherent to seeking patient-level outcomes in educational research and program evaluation. Junior faculty members may be inclined to “shoot for the moon” and seek an impact on patient outcomes before first establishing that their program is well received, leads to attitude and behavioral change, and is sustainable.
As with our previous papers [12-15], this study was not designed to be an exhaustive systematic literature review. We attempted to triangulate our naturally emergent list with more papers by utilizing expert consultation and an open social media call, which yielded some important recommended papers. Considering the depth and breadth of our final list, we feel that these adjunctive methods have resulted in an important, if not comprehensive, review of the literature.
We present five key papers addressing the topic of program evaluation with discussions and applications for junior faculty members and those leading faculty development initiatives. These papers provide a basis from which junior faculty members can design literature-informed program evaluations for their educational projects.
- Cook DA: Twelve tips for evaluating educational programs. Med Teach. 2010, 32:296–301. doi:10.3109/01421590903480121
- Fitzpatrick JL, Sanders JR, Worthen BR: Program Evaluation: Alternative Approaches and Practical Guidelines. Pearson Higher Ed, London, England; 2011.
- Haji F, Morin MP, Parker K: Rethinking programme evaluation in health professions education: Beyond “did it work?”. Med Educ. 2013, 47:342–351. doi:10.1111/medu.12091
- Goldie J: AMEE Education Guide no. 29: Evaluating educational programmes. Med Teach. 2006, 28:210–224. doi:10.1080/01421590500271282
- Armstrong EG, Barsion SJ: Using an outcomes-logic-model approach to evaluate a faculty development program for medical educators. Acad Med. 2006, 81:483–488. doi:10.1097/01.ACM.0000222259.62890.71
- Wenger E: Communities of practice: Learning, meaning, and identity. Cambridge University Press, Cambridge, England; 1990.
- ALiEM Faculty Incubator 2017-2018: Call for Applications. (2016). Accessed: March 3, 2017: https://www.aliem.com/faculty-incubator/.
- Thoma B, Paddock M, Purdy E, et al.: Leveraging a virtual community of practice to participate in a survey-based study: A description of the METRIQ study methodology. Acad Emerg Med. 2017, 1:110–113. doi:10.1002/aet2.10013
- Hasson F, Keeney S, McKenna H: Research guidelines for the Delphi survey technique. J Adv Nurs. 2000, 32:1008–1015. doi:10.1046/j.1365-2648.2000.t01-1-01567.x
- Thoma B, Chan TM, Paterson QS, et al.: Emergency medicine and critical care blogs and podcasts: establishing an international consensus on quality. Ann Emerg Med. 2015, 66:396-402. doi:10.1016/j.annemergmed.2015.03.002
- Thoma B, Poitras J, Penciner R, et al.: Administration and leadership competencies: establishment of a national consensus for emergency medicine. CJEM. 2015, 17:107-114. doi:10.2310/8000.2013.131270
- Chan T, Gottlieb M, Fant A, et al.: Primer series: five key papers fostering educational scholarship in junior academic faculty. West J Emerg Med. 2016, 17:519–526. doi:10.5811/westjem.2016.7.31126
- Gottlieb M, Boysen-Osborn M, Chan TM, et al.: Academic primer series: Eight key papers about education theory. West J Emerg Med. 2017, 18:293-302. doi:10.5811/westjem.2016.11.32315
- Gottlieb M, Grossman C, Rose E, et al.: Primer series: five key papers about team collaboration relevant to emergency medicine. West J Emerg Med. 2017, 18:303–310. doi:10.5811/westjem.2016.11.31212
- Chan T, Gottlieb M, Quinn A, et al.: Primer series: five key papers for consulting clinician educators. West J Emerg Med. 2017, 18:311–317. doi:10.5811/westjem.2016.11.32613
- Frye AW, Hemmer PA: Program evaluation models and related theories: AMEE Guide N. 67. Med Teach. 2012, 34:288–299. doi:10.3109/0142159X.2012.668637
- Cook DA, West CP, Colin P: Perspective: Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Acad Med. 2013, 88:162–167. doi:10.1097/ACM.0b013e31827c3d78
- Musick DW: A conceptual model for program evaluation in graduate medical education. Acad Med. 2006, 8:1051-1056.
- Cook DA, Ellaway RH: Evaluating technology-enhanced learning: A comprehensive framework. Med Teach. 2015, 37:961-970. doi:10.3109/0142159X.2015.1009024
- Durning SJ, Hemmer P, Pangaro LN: The structure of program evaluation: an approach for evaluating a course, clerkship, or components of a residency or fellowship training program. Teach Learn Med. 2007, 19:308-318. doi:10.1080/10401330701366796
- Blanchard RD, Torbeck L, Blondeau W: AM last page: A snapshot of three common program evaluation approaches for medical education. Acad Med. 2013, 88:146. 10.1097/ACM.0b013e3182759419
- Moore DE, Green JS, Gallis HA : Achieving desired results and improved outcomes: integrating planning and assessment throughout learning activities . J Contin Educ Health Prof. 2009, 29:1-15.
- Abrahamson S: Diseases of the Curriculum. J Med Educ. 1978, 53:951-957. doi:10.1097/00001888-197812000-00001
- Reed DA : Nimble approaches to curriculum evaluation in graduate medical education. J Grad Med Educ. 2011, 3:264-266. doi:10.4300/JGME-D-11-00081.1
- van der Vleuten CP, Schuwirth LW, Driessen EW, et al.: Twelve Tips for programmatic assessment. Med Teach. 2015, 37:641-646. doi:10.3109/0142159X.2014.973388
- Haan CK, Edwards FH, Poole B, et al.: A model to begin to use clinical outcomes in medical education. Acad Med. 2008, 83:574-580. doi:10.1097/ACM.0b013e318172318d
- Uttl B, White CA, Gonzalez DW: Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Stud educ eval. 2016, doi:10.1016/j.stueduc.2016.08.007
- Wong BM, Holmboe ES: Transforming the academic faculty perspective in graduate medical education to better align educational and clinical outcomes. Acad Med. 2016, 91:473-479. 10.1097/ACM.0000000000001035
- Karpa K, Abendroth CS: How we conduct ongoing programmatic evaluation of our medical education curriculum. Med Teach. 2012, 34:783–786. doi:10.3109/0142159X.2012.699113
- Dobbie A, Rhodes M, Tysinger JW, et al.: Using a modified nominal group technique as a curriculum evaluation tool. Fam Med. 2004, 36:402-406.
- Dijkstra J, Van der Vleuten CP, Schuwirth LW: A new framework for designing programmes of assessment. Adv in Health Sci Educ. 2010, 15:379-393. doi:10.1007/s10459-009-9205-z
- Shershneva MB, Larrison C, Robertson S, et al.: Evaluation of a collaborative program on smoking cessation: translating outcomes framework into practice. Journal of Continuing Education in the Health Professions. 2011, 31:S28–S36. doi:10.1002/chp.20146
- Dauphinee WD: The role of theory-based outcome frameworks in program evaluation: Considering the case of contribution analysis. Med Teach. 2015, 37:979-982. doi:10.3109/0142159X.2015.1087484
- Andolsek KM, Nagler A, Weinerth JL: Use of an institutional template for annual program evaluation and improvement: benefits for program participation and performance. J Grad Med Educ. 2010, 2:160-164. doi:10.4300/JGME-D-10-00002.1
- Feldman KA: Instructional effectiveness of college teachers as judged by teachers themselves, current and former students, colleagues, administrators, and external (neutral) observers. Res High Educ. 1989, 30:137-194. doi:10.1007/BF00992716
- Boring A, Ottoboni K, Stark PB: Student evaluations of teaching (mostly) do not measure teaching effectiveness. Scienceopen. 2016, 1–11. doi:10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1
- Morgan S, Henderson K, Tapley A, et al.: How we use patient encounter data for reflective learning in family medicine training. Med Teach. 2015, 37:897-900. doi:10.3109/0142159X.2014.970626
- Ambady N, Rosenthal R: Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. J Pers Soc Psychol. 1993, 64:431–441. doi:10.1037/0022-35126.96.36.1991
- Bordage G, Dawson B: Experimental study design and grant writing in eight steps and 28 questions. Med Educ. 2003, 37:376–385. doi:10.1046/j.1365-2923.2003.01468.x
- Oliphant R, Blackhall V, Moug S, et al.: Early experience of a virtual journal club. Clin Teach. 2015, 12:389-393. doi:10.1111/tct.12357
- Zendejas B, Wang AT, Brydges R, et al.: Cost: The missing outcome in simulation-based medical education research: A systematic review. Surgery. 2013, 153:160-176. doi:10.1016/j.surg.2012.06.025
- Wayne DB, Barsuk JH, McGaghie WC: Why medical educators should continue to focus on clinical outcomes. Acad Med. 2013, 88:1403. doi:10.1097/ACM.0b013e3182a368d5
- Kirkpatrick DL: Evaluating Training Programs. Berrett-Koehler Publishers, Inc, San Francisco; 1994.
Curated Collections for Educators: Five Key Papers about Program Evaluation
Ethics Statement and Conflict of Interest Disclosures
Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: Drs. Michael Gottlieb, Megan Boysen-Osborn, and Teresa M Chan report receiving teaching honoraria from Academic Life in Emergency Medicine (ALiEM) during the conduct of the study for their participation as mentors for the 2016-17 ALiEM Faculty Incubator. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.
The authors would like to acknowledge Dr. Michelle Lin and the 2016-17 Academic Life in Emergency Medicine (ALiEM) Faculty Incubator participants and mentors for facilitating the drafting and submission of this manuscript.
Cite this article as:
Thoma B, Gottlieb M, Boysen-Osborn M, et al. (May 04, 2017) Curated Collections for Educators: Five Key Papers about Program Evaluation . Cureus 9(5): e1224. doi:10.7759/cureus.1224
Received by Cureus: March 25, 2017
Peer review began: April 18, 2017
Peer review concluded: May 03, 2017
Published: May 04, 2017
© Copyright 2017
Thoma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License CC-BY 3.0., which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.