Chapter 16
Performance and Workplace-Based Assessment
Kirsty Magnier and Matthew Pead
Royal Veterinary College, UK
Introduction
Those on courses related to working in the veterinary professions require a wider variety of assessments than most students. They have a considerable span of learning outcomes to fulfill in the cognitive domain, but in addition must achieve a variety of psychomotor and affective outcomes that can be seen as measures of performance or competence in skills related to their professional life. Such skills and competencies are often learned in the workplace, and may be assessed there too. Thus, assessment on veterinary courses must provide feedback, motivation, and an understanding of achievement against outcome across a great variety of learning outcomes. In addition, assessment of the future veterinary professional must also satisfy the demands of public accountability, government, and regulatory bodies, and ultimately must safeguard veterinary medicine as a profession with an assurance that graduates are “safe to practice.”
Traditional assessment methods focus on what a student knows and can recall about a concept or topic, but do not always address more complex professional constructs. Over the last 20 years there has been a recognition that formats like long- and short-answer questions and multiple-choice tests have many issues with reliability and validity in assessing the more complex issues of competence and performance in a professional framework, and thus are not fit for this purpose. This has generated new assessment methods aimed at addressing these issues, which in turn has increased the range of assessment formats used in veterinary courses.
Miller (1990) developed a pyramidal framework for describing levels of competence within a hierarchy of knowledge, application, and performance, with associated methods of assessment (Holsgrove, 2003). The bottom level of the pyramid, entitled “Knows,” refers to the recall of factual knowledge. The second level, “Knows how,” refers to the ability to apply the factual knowledge in a particular context. It implies that there is more to competence than knowledge alone. The third level, “Shows,” relates to the ability to demonstrate competence in a practical, “simulated” environment; what a professional actually does in the workplace is at the top and fourth level of the pyramid, “Does.” This model sets a framework for a succession of levels of competence that can guide the implementation of assessment on veterinary courses. However, there are some important additional considerations that the model does not fully describe. The workplace is rightly at the peak of the pyramid, representing the ultimate goal of a course or perhaps a capstone assessment. However, there is an implication that as assessment comes closer to the workplace, its volume may reduce. In reality, workplace-based assessment represents the skills most relevant to professional life, and the point at which numerous component skills learned on a course are combined into clinical reasoning and professional competence. As these are the ultimate goals of most clinical courses, course design should take into account that workplace-based assessment may need to be a larger or even dominant volume of the assessment that students experience. Perhaps Miller’s pyramid needs to be drawn like a spinning top (Figure 16.1) – an appropriate metaphor for the careful balance of skills that the modern caring professional requires. In addition, as it has been demonstrated that students’ abilities assessed in a simulated environment do not always predict their performance in the workplace (Rethans et al., 1991), assessment of performance at one level of the pyramid needs to be carefully harnessed to the next stage.
The workplace has benefits in terms of learning and feedback over a more conventional didactic environment. Assessment in an authentic clinical setting is a strong motivator for learning (Spencer, 2003). Learners are better at reproducing and applying knowledge and skills if the context in which they have to perform resembles the context in which the knowledge and skills were first learned (Regehr and Norman, 1996). This relates to the “encoding and context specificity principle,” which states that information is learned within a context and that the context within which the memory has been developed is stored as well as the memory (Schuwirth and van der Vleuten, 2004). It is also relevant to the concept of “situated learning,” which plays a critical role in the development of health professionals (Lave and Wenger, 1991). The workplace is an excellent opportunity for an almost continuous stream of feedback, since junior professionals are often teamed with a range of experienced practitioners whom they can use as role models and sources of information. However, the most problematic facet of the workplace in assessment is the variable experience of each individual. This means that it is often difficult to ensure that each student has a chance to learn and demonstrate a very specific skill, although there are generic skills, particularly those in the affective domain, that everyone in the workplace experiences, and so these are the ones that can be most reliably assessed.
In the purest sense, the most valid place to assess how a student will perform in practice is in the workplace, and the logical extension of this would be to assess all the competency outcomes required at the end of a professional course in this fashion. However, there are issues of practicality and reliability that make this difficult to achieve, and in many cases such an assessment would not completely satisfy the regulatory burden with which many courses work. Most veterinary courses that have performance-based outcomes need to pick assessments appropriate both to those outcomes and to the resources of the institution from a palette of peer-reviewed assessment vehicles. This chapter provides an overview of performance and workplace-based assessment methods as currently used in veterinary and medical education, and establishes some of the challenges that are associated with their use, particularly for undergraduate students.
Performance-Based and Workplace-Based Assessment Methods
This section presents a chronological overview of the progress of assessment from traditional classroom environments (a “contextual vacuum”) to authentic settings such as the workplace, and why this was needed. The focus in this part of the chapter will be on the methods and their impact, including long-case and clinical evaluation exercise (CEX) assessment, objective structured clinical examination (OSCEs), mini clinical evaluation exercises (mini-CEX), direct observation of procedural skills (DOPS), multisource feedback (MSF), and portfolios.
Earlier written assessment methods such as essays, multiple-choice questions, and short-answer questions were designed to assess recall of knowledge, but are not suitable for assessing the application of knowledge in a practical/clinical environment. Written or single-answer questions have been extended by the use of vehicles such as open-book exams, extended-matching questions, and script concordance tests to move the outcome measured by an assessment toward clinical reasoning. Clinical reasoning is a higher cognitive skill requiring the collation of objective and subjective information relating to a patient, and the subsequent synthesis of an appropriate plan for further diagnosis or treatment. It is an essential method for clinical professionals, especially those working at a junior level, where their lack of experience means that they have little recourse to pattern recognition to solve clinical problems. However, these formats are still devoid of all the practical considerations of the workplace, leaving a gap for assessment methods to address. The long case and the CEX were among the first vehicles developed to evaluate undergraduate and graduate students’ clinical skills in a workplace setting.
Long Case and Clinical Evaluation Exercise
The long case has been used to evaluate undergraduate students’ clinical skills for the last 30 years. The student is given “observed” time with a real patient in a clinical setting, gathers information about their problem, and performs a physical examination. The student then relays information to the examiners, who ask about the patient and related topics, enabling them to judge the quality of the student’s performance (Norcini, 2001).
Strengths of the long case include its authenticity: the student is able to perform an examination on a real patient in the workplace. However, it is an assessment that is fraught with concern over reliability. Wilson et al. (1969) discussed how a candidate being assessed via the long case was scored differently by different examiners, thus reaffirming the issues with interrater reliability (Davis et al., 2001). As well as examiner effects, case specificity (intercase reliability) can be a problem, and therefore this method should not be used solely to summatively assess a student’s clinical skills (Norcini, 2001). Modifications to the assessment may increase the reliability, such as increasing the number of encounters the students have to face, examiner training, and increasing the number of examiners present (Ponnamperuma et al., 2009). Although this may improve the reliability of the assessment method, it will not raise it to a level that supports its use in a high-stakes examination (Wass, Jones, and van Vleuten, 2001).
In 1962, the traditional clinical examination exercise was developed with a purpose to assess clinical skills in doctors in postgraduate training, and it eventually replaced the oral examination for certification in the United States (Searle, 2008). The assessment is of two hours’ duration, and involves a senior staff member observing a medical graduate (candidate) taking a history and performing a physical examination on a patient, then assessing their performance on it directly afterward (Durning et al., 2002). The CEX uses direct observation of candidates, thus giving them the opportunity for immediate feedback. The use of a real patient instead of a standardized patient means that the exercise is feasible and not costly (Norcini et al., 1995).
However, the CEX was criticized for the length of time it took to complete each assessment, limited reliability due to the few assessments happening with the candidate (one CEX per year), and only being observed by one examiner. This eventually led to its replacement in the curriculum by the mini-CEX (Durning et al., 2002), which is reviewed later in the chapter.
There was recognition that the long case and the CEX had reliability issues, which brought a new and innovative phase in the history of assessment. This saw a progression from subjective assessments (long case, CEX) in the 1960s toward a more objective, structured format (OSCE) in the 1970s, with a greater emphasis on achieving a minimum acceptable level of reliability and validity (Rhind, 2006).
Objective Structured Clinical Examination
Over 30 years ago the OSCE was introduced into medical education by Harden et al. (1975) as a format that evaluates the performance of undergraduate and graduate clinical skills in an objective, structured, and simulated learning environment. This method examined a number of clinical competencies across a range of problems and comprised a circuit of individual stations, each of which is 5–10 minutes in duration, around which the candidates rotate. In each station a student is examined on a single competency on a one-to-one basis by an examiner. Students are assessed on their clinical, history-taking, problem-solving, and communication skills in an objective and structured format, and may be required to interact with a patient (simulated or real). This method assesses at the third level, “Shows,” of Miller’s pyramid (Miller, 1990).
The strengths of this format lie in the standardization that occurs: it allows a fair comparison between candidates. Each station has a different examiner recording the candidate’s progression through a checklist of steps, ensuring the objectivity of the examination. On entering a station, each candidate is given the same amount of information about the task, keeping to a structured format. A weakness of the OSCE is that it is set in an artificial, “simulated” environment, which may not mirror the constraints or freedoms of a real workplace. The simulated environment limits the type of scenario that the student will be able to encounter (Smee, 2003). The feasibility of the method also needs to be considered, since OSCEs are resource intensive in terms of the extensive preparation and costs involved, principally staff being fully trained to be observers and then taking time out of their other duties.
Scoring of an OSCE is either “analytical,” through a tick box, checklist-rating format (yes/no), or “holistic,” with a global ratings scale (GRS) that reviews performance using a Likert-scale approach (Read et al., 2015) The checklist approach is more commonplace within veterinary education, but a number of systems also incorporate a global assessment/judgment at the end, which is used for the purposes of standard setting and not to influence the mark of an individual candidate. The checklist approach is considered reductionist and may not consider the sum of all the different parts of the performance during the session, whereas the global rating scales appear to allow the assessor to accommodate the more qualitative elements of the performance, since the examiner is required to make a judgment along a scale that is continuous (Regehr and Norman, 1996). Regehr and Norman’s (1996) was one of the first studies to compare the different approaches for scoring a candidate’s performance; their results revealed that global rating scales scored by experts showed higher interstation reliability, better construct validity, and better concurrent validity than checklists. They used experts as their examiners, which brings up the issue of training for this approach in order to avoid subjectivity and low reliability in the judgments. More recent research from Read et al. (2015) found that there was no significant difference in observer scores of student performance between institutions using checklists and global rating scales.
The OSCE was perceived to be ground-breaking and innovative in the 1970s to 1980s and remains a respected and commonly used method in medicine, as it has enabled the assessment of individual competencies in a controlled and simulated environment (Hodges, 2006). Its development and acceptance in veterinary education occurred around the beginning of 2003 and a new term has been used to describe the format that is more specific to certain aspects and areas of the veterinary profession: OSPVE (objective structured practical veterinary examination). This was coined by a project at the Royal Veterinary College in 2002 that was exploring new methods of assessment, a major part of which was the development of the OSPVE. The two terms are used interchangeably.
Figure 16.2 shows a typical OSCE assessment form.
Mini Clinical Examination Exercise
In the 1990s the authenticity of the learning environment began to be recognized as important, with assessment methods shifting from educational classroom environments to real-life settings in the workplace, ensuring a balance between reliability and validity. Authentic assessment methods were designed on the principle that people are better at reproducing and applying knowledge and skills if the context in which they have to do so resembles the context in which the knowledge and skills were first learned (Regehr and Norman, 1996). Interest in assessment of graduates in the workplace was renewed with the evolution of the mini-CEX and DOPS. The majority of these workplace-based assessment methods originated and are primarily used within the graduate curriculum in medicine, but are being adapted by some medical and veterinary medical schools in Europe and the United States for their undergraduate programs.
The CEX was mentioned as an earlier method of assessment of clinical skills, but has been replaced by the mini-CEX due to its questionable reliability. The latter method was developed to capture performance in the workplace at the “Does” (highest) level of Miller’s pyramid, although its utility is still in its infancy (Miller, 1990).
The purpose of the mini-CEX is to formatively assess the clinical skills of graduates in the workplace. The examiner observes an encounter of 15–20 minutes with a patient that may be conducted in a variety of settings (inpatient, emergency room, and outpatient), then gives immediate feedback to the candidate. This format entails an assessment sheet that the examiner uses to assess the candidate’s competency in history taking, physical examination tools, clinical decision-making, professionalism, counseling, organization, and overall clinical competence (Malhotra, Hatala, and Courneya, 2008). Traditionally, examiners in this method included faculty staff and clinical specialists, but nurses and those in appropriate allied health professions can also act as examiners in certain situations. Teachers perceived it to be a useful and feasible method for promoting “direct observation” and “constructive feedback” (Alves de Lima et al., 2010). The study by Alves de Lima et al. (2010) found that residents in medical education perceived the mini-CEX to be a useful method that promoted reflection and a constructive approach to learning. The perceived acceptability and validity of the method were also reported by Hill et al. (2009) from the experience of staff and students using it.
In order to achieve significant interrater reliability, the assessment is undertaken over a period, several times a year, and with a number of different examiners, resulting in a reliable measure of an individual’s performance. There is variability about how many times a student needs to be observed, ranging from 7 to 14 encounters per year and by multiple raters, which can create organizational and feasibility issues (Norcini, 2005; Sidhu, McIlroy, and Regehr, 2005; Alves de Lima et al., 2010; Williams, 2003). These weaknesses raise the issue about the method’s use as a formative or summative piece of assessment, and whether it can be employed in isolation. It is not recommended for summative high-stakes assessment.
The mini-CEX is primarily used in medicine, but variations of the method are being piloted and used formatively in veterinary education. The Faculty of Veterinary Medicine at Utrecht University (Netherlands) instigated a new assessment program in 2010 that includes a modified mini-CEX with its undergraduate veterinary students (Jaarsma et al., 2010; Bok et al