Vol. 40, No. 3 (Fall 2004)

This full-text article is provided as a member service of the Council on Social Work Education.

Toward New Approaches for Evaluating Student Field Performance: Tapping the Implicit Criteria Used by Experienced Field Instructors

Marion Bogo
University of Toronto

Cheryl Regehr
University of Toronto

Roxanne Power
University of Toronto

Judy Hughes
University of Toronto

Michael Woodford
University of Toronto

Glenn Regehr
University of Toronto

This study determined the reliability of ratings and consistency of descriptions generated by experienced field instructors using only their acquired practice wisdom as a framework to evaluate students. Ten field instructors independently divided 20 student vignettes into as many categories as necessary to reflect various levels of student performance, described their categories, and ranked the individual vignettes within each category. The independently generated categories and their descriptions were very similar across instructors, and the inter-rater reliability was very high both for the placement of vignettes into categories (0.77) and the rankings (0.83).

The field practicum is credited by alumnae and employers as the component of university-based social work education that is most crucial for the development of practice-based skills and for socializing students into the professional role (Fortune, 1994; Goldstein, 2000; Kadushin, 1991; Schneck, 1991; Tolson & Kopp, 1988). However, due to the breadth of learning that occurs in the practicum and the diversity of practice challenges and learning opportunities that students encounter, this has simultaneously been the most difficult aspect of social work education to evaluate. Over the past 2 decades social work educators have sought to articulate outcome objectives and related criteria for assessing student field learning and practice competence and to establish the reliability and validity of evaluation measures (Bogo, Regehr, Hughes, Power, & Globerman, 2002; Dore, Epstein, & Herrerias, 1992; Koroloff, & Rhyne, 1989; O’Hare & Collins, 1997; O’Hare, Collins, & Walsh, 1998; Reid, Bailey-Dempsey, & Viggiana, 1996; Vourlekis, Bembry, Hall, & Rosenblum, 1996). Despite these efforts, field educators question whether current methods and measures of evaluating student competence fulfill the gatekeeping responsibilities of social work education programs (Alperin, 1996; Kilpatrick, Turner, & Holland, 1994; Raskin, 1994).

Of various models used to evaluate the practice of social work students, the competency-based education model (CBE) is the one that has received the most attention in the past 2 decades. While definitions vary, competency generally refers to a complex set of behaviors that demonstrate possession of knowledge, skills, and attitudes to a standard associated with effective practice in a particular profession. Competency-based education was introduced to North American social workers in the mid-1970s and includes three components (Arkava & Brennan, 1976; Clark & Arkava, 1979; Gross, 1981). First, educational goals are identified in clear, measurable behaviors or skills with indicators or standards established to judge varying levels of performance. Second, evidence or data are identified that will be used to assess the degree to which students have mastered those skills. Finally, sound evaluation methods are developed to assess the student based on this evidence. Unlike earlier evaluation methods, which rested largely on written or verbal reports of students’ practice, competency evaluation includes observation of actual performance. From a primary focus on curriculum structure, content, and process, CBE requires educators to emphasize measurement and evaluation of observable performance outcomes. In this regard, CBE has been described as less of a theory and more of an approach to producing a skilled workforce through procedures to ensure that graduates can perform expected behaviors to an acceptable standard (Hackett, 2001).

The CBE approach is widely used in professional and vocational education, especially in Great Britain, Australia, and New Zealand, where legislation regulates its application in specific fields and occupations. It is seen as a way to enhance the accountability of workers (Csiernik, Vitali, & Gordon, 2000). Nevertheless, critics have been vocal about the reductionism in the model as it condenses professional practice into numerous discrete skills, and this fragmentation loses the very essence of professional thinking and action (Field, 2000; Hyland, 1995). Furthermore, it has been suggested that the emphasis on concrete, observable behaviors does not access the less tangible aspects of practice, such as the integration of values and ethical principles and the way in which a practitioner exercises judgment in complex professional situations (Kelley & Horder, 2001) and creatively responds to the uniqueness of individual situations (Schon, 1987; 1995).

Competency-based education is founded on the belief that professional practice can be defined in terms of techniques that, when applied to actual problems, will yield positive outcomes. In contrast, Schon (1995) argued that professional practice involves confronting uncertain and complex situations where the simplistic application of knowledge is simply insufficient. Rather, expert practitioners are able to use practice wisdom and highly developed intuition to generate alternative solutions in the moment. They draw on implicit knowledge gleaned from highly complex individual cases to respond to practice challenges. This process of using tacit knowledge, referred to in social work as practice wisdom, is a hallmark of what can be referred to as “the art” of professionals. This understanding of professional practice requires another more fluid approach to defining and evaluating practice competence.

The concept of practice wisdom is appealing to experienced social workers as it captures a view of practice that is greater than the discrete behaviors that represent competence. However, practice wisdom traditionally has eluded concrete definition and explication, and hence this characteristic has been difficult to investigate in research on the evaluation of competence. Eraut (2002) has observed that research about what professionals know and how they come to know it is difficult to conduct, as many professional workers “provide very limited accounts of their knowledge, practice and learning” (p. 2). He urges researchers to undertake “even modest incursions into this uncharted territory” (p. 2).

The project reported in this paper is part of a larger study aimed at developing new approaches to evaluating student field learning that reflect the various components of professional practice discussed above. The present research project seeks to tap into the concept of practice wisdom and to ascertain how it is demonstrated in global judgments of student competence. It also aims to establish whether there is agreement between experienced field instructors about what constitutes global assessments of student performance. It seeks to determine the degree to which experienced field instructors evaluate students in a consistent manner when the standardized tools of a CBE model are not available. More specifically, our study focused on three questions. When experienced field instructors are presented with condensed descriptions of 20 students,

• Is there agreement among field instructors as to how students rank in terms of relative competency?

• Do field instructors agree about which students are not suitable for social work, which require further training, and which are ready for practice?

• What student characteristics do field instructors view as most critical to their decisions about ranking and readiness for practice?


In this study, 10 experienced field instructors were brought together for the purpose of evaluating and ranking a series of students described in vignettes. Throughout an intensive 1-day process, instructors were asked to follow a series of steps in evaluating students and to record the basis for their decisions. In two focus groups they then reviewed the relative rankings and sought to determine factors that led to agreement or disagreement among the raters.

The purposive sample of 10 highly experienced field instructors was selected from a list of all 240 field instructors for the University of Toronto in the academic year 2002–2003. Individuals were included based on their ability to contribute to the above research objectives, their expertise and experience as field instructors, and their clinical expertise as determined by the program’s field practicum director and associates. Efforts were made to ensure that a representative number of men and women and a range of ethnic and age groups were included in the sample and that the participants represented a variety of clinical settings. Potential participants were contacted by telephone, and all agreed to participate following a description of the study. The sample consisted of six women and four men, three of whom were from visible minority groups. There were four instructors in their 30s, four in their 40s, and two in their 50s. A range of agencies were represented, as four instructors were in mental health settings, two in child protection, and four in health care. All held master of social work degrees, with a mean of 14 years (range 4–24) practice experience with their current employer. In total, participants had a mean of 19 years (range 12–30) of social work or related practice experience. On average, field instructors had 8 years (range 2–15) of experience with the school’s practicum program. Eight were also field instructors for other social work education programs.

To develop representative student vignettes, the authors held a series of long interviews (McCracken, 1988) with 19 experienced field instructors from mental health, health, and child protection settings. In the interviews, field instructors were asked to describe examples of students who they considered to be exemplary, problematic, and average. Twenty representative student vignettes were developed through data abstraction and aggregation. Each vignette followed a similar format in describing student performance and competencies, and where possible, examples of students’ interactions with clients and colleagues were included. Although the practice setting was identified for each vignette, all identifying information was removed and gender-neutral language was used to describe each student (i.e., he/she). There was no overlap between the instructors who provided student descriptions in the interview process and those who participated in the ranking process. (See Appendix for an example of a vignette.)

Data collection with the 10 instructors participating in the ranking process involved several phases. In phase one, the participants independently reviewed each vignette and made notes or highlighted aspects of each vignette they viewed as relevant. Next, the participants placed each vignette in categories based on their perception of shared competencies and characteristics, making as many categories as necessary. These categories were then placed on a continuum relative to one another, from high to low performance, and subsequently assigned a number, with 1 being the highest and so on to the end of the groups. Field instructors were then asked to assign a ranking to each vignette within each category, with 1 again being the highest and so on. This resulted in a ranking of 1 to 20 for all the vignettes. Last, participants described in written form the competencies and characteristics shared among the vignettes in each category and those competencies and characteristics that were different between categories.

In phase two, participants were divided into two focus groups. A member of the research team facilitated each group process and two recorders captured the content and process of each. Another researcher monitored each group in terms of consistency of the research process. Groups were evenly matched in terms of gender and practice fields. In the focus group, participants were given their individual ranking for each vignette, the ranking of other field instructors, and the aggregate rankings of the vignettes by all 10 participants. After reviewing their individual rankings and those of others, field instructors were asked to discuss the vignettes wherein their individual ranking differed from others. Participants were asked to discuss their rationale for the ranking in terms of competencies and characteristics described in the vignettes.

During phase three, each focus group was provided with all the vignettes on a continuum that reflected the overall sample’s mean rankings. Participants were asked as a group to make distinctions among the vignettes in this continuum. As a starting point, participants were instructed to identify the vignettes that described students who would never be appropriate to practice and those that described students who were ready to practice independently. Participants then identified other points of distinction among the vignettes.

In accordance with guidelines from the Research Ethics Board at the University of Toronto, written consent was obtained from each of the participants. While the student vignettes represented composites of students described by the 19 field instructors interviewed and had identifying data removed, it was within the realm of possibility that some aspects of the vignettes might be recognizable to participants. Consequently, all participants were asked and agreed to keep confidential all information about the student profiles and all decisions and discussions generated during the focus groups.


Following a reading of the 20 vignettes, each participant assigned each vignette to a category based on the participant’s perception of shared competencies and characteristics. Although there were no limitations placed on the number of categories, field instructors generated from four to seven categories, which is a relatively narrow range. Treating the category membership of each vignette as a “score” for the individual student, we then assessed the consistency of assignments to categories made by the instructors using a single rater intraclass correlation coefficient (ICC). This is consistent with the recommendations of Auerbach & Caputo (in press) for determining inter-rater reliability. The ICC was 0.77, suggesting that the consistency of categorizations between any two randomly selected raters was quite high.

Field instructors were then asked to assign a ranking to each vignette within each category. As the categories generated by instructors were ordinal in structure, this resulted in a relative ranking of each vignette from 1 to 20. A ranking of 1 represented the highest score or the best student; a ranking of 20 represented the lowest score or the weakest student. The inter-rater reliability of these relative rankings generated by individual instructors was higher still, with an ICC of 0.83, again suggesting a very high consistency of rankings between any two randomly selected raters.

Finally, the instructors were divided into two focus groups to discuss the overall rankings. There was very little discrepancy among the field instructors’ rankings of individual student vignettes. During discussions about vignettes where there were discrepancies, it became clear that there was almost no dissension about the skills that students possessed; differential rankings tended to be based on whether the field instructor doing the ranking felt that the student had the capacity to learn. Comments included, “Has some good skills, but the motivation was not there, so I dropped the ranking down,” “I had concerns that this student over-identified and struggled, but growth is possible,” and “Struggles, but changing, growing, there is hope, this person could be worked on to improve.” These discrepancies occurred in the middle categories of students, but did not occur in the top or bottom categories.

The two field instructor groups also constructed a set of new categories through consensus. Five categories were generated by each of the focus groups: (1) exceptional students or “stars”; (2) ready for practice; (3) on the cusp; (4) need more training, for example another practicum; and (5) unsuitable for the profession. The categories were highly consistent between the two focus groups, with only one vignette being sorted differently. Agreement between the groups about assignments to categories was calculated using an ICC, which resulted in a coefficient of 0.99.

Comments that described the categories helped elucidate the decision-making process: “It was fun to separate the categories. In doing the task, I tried to decide if I would pass the person or not. I used a 5-point scale that we have at work to rate our own performance and so this made the task extremely easy.” One participant noted, “The top and bottom were easy to pull out.”

Comments made by instructors helped elucidate the decision-making criteria. The outstanding students or stars were described as follows: “The top category were balanced, good social workers/employees/team workers and had the skills, motivation, desire, growth, ethical, professional and relationship building skills.” “The top had skills, knowledge, commitment to social work. I would highly recommend them for a job.” “This category is solid and the difference is in the initiative and enthusiasm as well as their ability to transfer their learning to practice.” “The top two categories had energy and eagerness and would be fun to work with.”

Students in the middle categories were described as follows: “They had some bad habits, but had energy.” “With the middle category there would be lots of rewards [in teaching] as they would improve the most.” “The middle category was more of a challenge [to rank] because there is more work [in] developing the student. Maybe some of these would do better in a different agency.” “Had good skills but lacked security and self-confidence.”

Students in the bottom categories were deemed unsuitable for social work for the following reasons: “Lack of progress and inability to have relationships with clients,” “No values/ethics and not fit for social work,” and “Disrespectful.”


As part of the field education experience, social work education programs expect field instructors to evaluate students based on some set of criteria. The competencies that programs develop should provide an important generic foundation and define a group of valid skills that the profession states practitioners should have. To the extent that these competencies are empirically tested and it has been demonstrated that they lead to positive outcomes for clients, then social work educators can truly claim that they are being accountable to the public good. However, what is not known is how field instructors approach evaluation and what implicit or tacit models and beliefs they use in student evaluation. To what extent are they governed solely by the requirements of competency-based education, evaluating evidence of student practice episodes in relation to standards for numerous discrete concrete behaviors and skills? To what extent are they evaluating students based on a set of qualities or characteristics that may not be captured in the competency inventory—those that remain unarticulated, but constitute a major influence on their perception of whether or not a student should pass into the ranks of the profession?

This study attempted to determine the consistency of evaluations among experienced field instructors when they were not provided with explicit competency-based criteria. Rather, we sought to rely upon the judgment of field instructors, which stems from their professional expertise and practice wisdom. When experienced field instructors were asked to read a series of 20 student vignettes, categorize them according to types of students, and rank them relative to one another, high levels of inter-rater reliability between the participating field instructors were found. The inter-class correlation coefficient for category assignments by individual instructors was 0.77, for overall rankings of students by instructors it was 0.83, and for agreement on categorizations between two different groups of instructors it was 0.99. Thus, there was considerable agreement between the participating instructors on what constituted outstanding, acceptable, and unacceptable student performance. Qualitative comments during the focus group process demonstrated that decision-making criteria were frequently based on personality characteristics and not on explicit skills. Instructors noted that students in the top categories were motivated, committed, enthusiastic, energetic, and eager; that students in the middle categories often lacked security and self-confidence; and that students in the bottom categories lacked relational capacity and were disrespectful.


The results of this study of inter-rater reliability regarding student performance must be viewed with caution since the field instructors selected for this study were highly experienced and do not represent the range of experience in the standard pool of instructors in any one social work education program. Further, the vignettes were selected in a manner that may have artificially inflated the variance between students. While all vignettes represented students that had in fact been in our program, students at the high and low end were overrepresented. It would no doubt be more difficult to reliably differentiate the majority of students who fell into the middle categories. In addition, this exercise did not replicate the actual process of evaluation. Field instructors do not evaluate students objectively solely based on characteristics described by others; rather they evaluate students in the context of a relationship that develops between the student and the field instructor. Thus, whether the level of inter-rater reliability found in this study can be maintained when instructors have relationships with the students is as yet undetermined.


This study demonstrated that experienced field instructors were remarkably consistent in their ability to differentiate between students and to identify a range of student competence when they were presented with descriptions of student performance. Descriptors of students at various levels of competence were markedly similar and were primarily based on the students’ motivation, self-confidence, relational capacity, and integrity and secondarily on concrete skills. This suggests that even in the absence of explicit competency-based criteria for student evaluation, experienced social work field instructors are able to agree on what constitutes exemplary performance, which students are likely to develop into good social work professionals with additional training and supervision, and which students are clearly unsuitable for practice. The consistency of these global judgments seems to support the notion that there is a common practice wisdom that is shared by experts in the field. It may therefore be timely for schools of social work to develop and test new methods for student field practice evaluation that tap into this concept of practice wisdom as a means of determining competence for the profession.


Alperin, D. E. (1996). Empirical research on student assessment in field education: What have we learned? The Clinical Supervisor, 14(1), 149–161.

Arkava, M., & Brennan, C. (Eds.). (1976). Competency-based education for social work: Evaluation and curriculum issues. New York: Council on Social Work Education.

Auerbach, C., & Caputo, R. (in press). Statistical methods for estimates of inter-rater reliability. In A. Roberts & K. Yeager (Eds.), Handbook of Practice-Based Research and Evaluation. New York: Oxford University Press.

Bogo, M., Regehr, C., Hughes, J., Power, R., & Globerman, J. (2002). Evaluating a measure of student field performance in direct service: Testing reliability and validity of explicit criteria. Journal of Social Work Education, 38, 385–401.

Clark, F., & Arkava, M. (1979). The pursuit of competence in social work. San Francisco: Jossey-Bass.

Csiernik, R., Vitali, S., & Gordon, K. (2000). Student and field instructor perceptions of a child welfare competency-based education and training project. Canadian Social Work, 2(2), 53–64.

Dore, M., Epstein, B., & Herrerias, C. (1992). Evaluating student’s micro practice field performance: Do universal learning objectives exist? Journal of Social Work Education, 28, 353–362.

Eraut, M. (2002). Editorial. Learning in Health and Social Care, 1(1), 1–5.

Field, L. (2000). Organisational learning: Basic concepts. In G. Foley (Ed.), Understanding adult education and training (2nd ed., pp. 159–173). Sydney, Australia: Allen & Unwin.

Fortune, A. E. (1994). Field education. In F. J. Reamer (Ed.), The foundations of social work knowledge (pp. 151–194). New York: Columbia University Press.

Goldstein, H. (2000). Social work at the millennium. Families in Society: The Journal of Contemporary Human Services, 81(1), 3–10.

Gross, G. M. (1981). Instructional design: Bridge to competence. Journal of Education for Social Work, 17(3), 66–74.

Hackett, S. (2001). Educating for competency and reflective practice: Fostering a conjoint approach in education and training. Journal of Workplace Learning, 13(3), 103–112.

Hyland, T. (1995). Morality, work and employment: Towards a values dimension in vocational education and training. Journal of Moral Education, 24(4), 393–406.

Kadushin, A. E. (1991). Introduction. In D. Schneck, B. Grossman, & U. Glassman (Eds.), Field education in social work: Contemporary issues and trends (pp. 11–12). Dubuque, IA: Kendall/Hunt.

Kelly, J., & Horder, W. (2001). The how and why: Competences and holistic practice. Social Work Education, 20(6), 689–699.

Kilpatrick, A. C., Turner, J., & Holland, T. P. (1994). Quality control in field education: Monitoring students’ performance. Journal of Teaching in Social Work, 9(1/2), 107–120.

Koroloff, N. M., & Rhyne, C. (1989). Assessing student performance in field instruction. Journal of Teaching in Social Work, 3(2), 3–16.

McCracken, G. (1988). The long interview. Newbury Park, CA: Sage.

O’Hare, T., & Collins, P. (1997). Development and validation of a scale for measuring social work practice skills. Research on Social Work Practice, 7(2), 228–238.

O’Hare, T., Collins, P., & Walsh, T. (1998). Validation of the practice skills inventory with experienced clinical social workers. Research on Social Work Practice, 8, 552–563.

Raskin, M. (1994). The Delphi study in field instruction revisited: Expert consensus on issues and research priorities. Journal of Social Work Education, 30, 75–88.

Reid, W. J., Bailey-Dempsey, C., & Viggiana, P. (1996). Evaluating student field education: An empirical study. Journal of Social Work Education, 32, 45–52.

Schon, D. (1987). Educating the reflective practitioner. San Francisco, CA: Jossey-Bass.

Schon, D. A. (1995). Reflective inquiry in social work practice. In P. M. Hess & E. J. Mullen (Eds.), Practitioner-researcher partnerships: Building knowledge from, in, and for practice (pp. 31–55). Washington, DC: NASW Press.

Schneck, D. (1991). Integration of learning in field education: Elusive goal and educational imperative. In D. Schneck, B. Grossman, & U. Glassman (Eds.), Field education in social work: Contemporary issues and trends. Dubuque, IA: Kendall/Hunt.

Tolson, E. R., & Kopp, J. (1988). The practicum: Clients, problems, interventions and influences on student practice. Journal of Social Work Education, 24, 123–134.

Vourlekis, B., Bembry, J., Hall, G., & Rosenblum, P. (1996). Testing the reliability and validity of an interviewing skills evaluation tool for use in practicum. Research in Social Work Practice, 6, 492–503.

Appendix: Student Vignette.

R is a mature learner (mid-30s) and has the ability to integrate professional knowledge with life experience. This ability is particularly useful with clients as R uses his/her life experiences to relate to clients as part of the engagement-therapy process. This combination of maturity/life experience and professional/academic knowledge also helps colleagues view R as a co-worker from whom they can learn, rather than an inexperienced student. R arrives at your agency with clearly defined learning goals that are specific to the practice area. However, R has a number of other responsibilities and time commitments outside of the placement that create struggles for him/her in devoting time and energy to the placement and really feeling as though he/she is part of the team. Nevertheless, other team members and clients see R as an experienced worker. Team members look to R to learn new information based on R’s non-academic knowledge and life experiences. R always behaves ethically and professionally.

Clinically, R enters the placement well versed in family of origin and psychodynamic perspectives, and thus works from these theories. Although R has an appreciation for other therapeutic frameworks, R prefers to work within family of origin and psychodynamic perspectives. Assessment skills related to these theories are strong, however, R needs additional practice with intervention skills. Despite this focus on one particular theoretical framework, R effectively uses her/his own life experiences to inform his/her practice, which reflects a positive use of self.

Report writing skills overall are fine with some room for improvement in the finer points of grammar. R speaks English as a second language, thus, he/she is not secure when making formal presentations, but he/she is confident in other interactions with clients and with team members and in other settings such as case conferences and team meetings.


Accepted: 04/04.

Marion Bogo is professor, Faculty of Social Work, University of Toronto. Cheryl Regehr is associate professor and Sandra Rotman Chair in Social Work, Faculty of Social Work, University of Toronto. Roxanne Power is a senior lecturer, Faculty of Social Work, University of Toronto. Judy Hughes and Michael Woodford are doctoral candidates, Faculty of Social Work, University of Toronto. Glenn Regehr is associate professor and Richard and Elizabeth Curie Chair in Health Professions Education Research, Faculty of Medicine, University of Toronto.

This study was funded by a grant from the Social Sciences and Humanities Research Council of Canada.

Address correspondence to Marion Bogo, Faculty of Social Work, University of Toronto, 246 Bloor Street West, Toronto, Ontario, Canada, M5S 1A1; email: marion.bogo@utoronto.ca.

© by the Council on Social Work Education, Inc. All rights reserved.


Back to Section Start to JSWE Online Start

CSWE Home | About CSWE | Membership | Accreditation | Member Program Directory | Annual Program Meeting | Programs & Services | Projects | Publications | Links | Login | Search