Multicultural Content and Class Participation: Do Students Self-Censor?

Journal of Social Work Education

Vol. 38, No. 3 (Fall 2002)

This full-text article is provided as a member service of the Council on Social Work Education.

Evaluating a Measure of Student Field Performance
in Direct Service: Testing Reliability and Validity
of Explicit Criteria

Marion Bogo
University of Toronto
Roxanne Power
University of Toronto
Cheryl Regehr
University of Toronto
Judith Globerman
Vancouver Coastal Health Authority
Judy Hughes
University of Toronto

This study examines the reliability and validity of a measure to evaluate student field performance. Results demonstrated a consistent factor structure with excellent internal consistency, however, there was inadequate consistency between ratings of individual students in their first and second field education experiences. The measure had some predictive validity in that it could differentiate between students identified as having difficulty in Year 1 of the program, but not in Year 2. Scores were significantly associated with academic grades. Implications for future instrument development and the process of evaluation are considered.

The profession of social work relies on social work educational programs to produce competent professional practitioners. Consequently, university programs function as the gatekeepers for the profession and are seen by the public as accountable for the quality of social work services available in the community. Social work educators recognize their responsibility to ensure that graduating students have achieved a performance level necessary for beginning practice (Kilpatrick, Turner, & Holland, 1994). Of all the aspects of social work education, field education is credited by alumnae and employers as the most significant component in the preparation of social work practitioners (Canadian Association of Schools of Social Work, 2001; Fortune, 1994; Goldstein, 2000; Kadushin, 1991; Schneck, 1995; Tolson & Kopp, 1988). However, a review of the literature and anecdotal comments of field coordinators reveals a consistent theme over the past two decades: the lack of objective, standardized outcome measures for assessing social work students’ learning and performance in field education (Alperin, 1996; Kilpatrick et al., 1994; Pease, 1988; Wodarski, Feit, & Green, 1995). Raskin (1994), for instance, surveyed field education experts in 1980 and again in 1991 to identify field instruction issues and research priorities. In 1980, the consensus was that the primary research concern was the crucial need for methods of testing the attainment of specific skills in field instruction. More than 10 years later the same concern remained: “How does one test for the attainment of specific competencies? How do we better measure student field performance?” (Raskin, 1994, p. 79). Social work educators do not appear confident that graduates are competent to practice. In a study that included 81 schools of social work, Koerin & Miller (1995) determined that 27% (18) of the schools reported that students terminated by their schools for non-classroom behaviors had field performance problems as the primary cause of their dismissal. This reason for termination came second only to ethical breaches, which also may frequently be identified in field education. What is not clear is how the determination that a student is unfit to practice is made. Although blatant behavioral issues may be obvious, it is less clear how students who do not quite reach the necessary level of competence are identified. If social work educators are unable to differentiate reliably between those students who possess the skills to practice and those who do not, we are failing in our critical role as gatekeepers for the profession.

Outcomes in the Field Eduction Literature

Over the past two decades the knowledge base for field education has grown substantially as researchers have studied aspects of teaching and learning that contribute to positive outcomes in field education. Studies have focused on field instructor behaviors and the student–field instructor relationship (Alperin, 1998; Fortune & Abramson, 1993; Knight, 1996, 2001); the range and nature of educational activities (Baker & Smith, 1987; Fortune, McCarthy, & Abramson, 2001; Gitterman, 1989); structures and models for field education (Cuzzi, Holden, Rutter, Rosenberg, & Chernack, 1996; Spitzer et al., 2001); and inter-organizational relationships between universities and their field settings (Bogo & Globerman, 1999; Globerman & Bogo, 2002).

Although a few studies evaluating aspects of field education use student performance as the outcome measure (Deal, 2000; Fortune et al., 2001; Reid, Bailey-Dempsey, & Viggiana, 1996), the great majority use student satisfaction or student perception of helpfulness. Although Fortune and her colleagues (2001) note that satisfaction and subjective ratings of helpfulness are important intermediate outcomes, Gambrill (2000) suggests that there may even be a negative correlation between student satisfaction and learning. Those researchers who use performance or competence as an outcome employ a range of measurement approaches including student self-assessment (O’Hare & Collins, 1997; O’Hare, Collins, & Walsh, 1998); field instructor ratings of student performance, using the program’s regular evaluation form (Fortune et al., 2001; Koroloff & Rhyne, 1989; Lazar & Mosek, 1993); researchers’ ratings of student performance from process recordings (Deal, 2000; Vourlekis, Bembry, Hall, & Rosenblum, 1996); and researchers ratings of student performance on audiotapes (Reid et al., 1996).

To date, however, few attempts have been made to directly evaluate the reliability and validity of a measurement tool that assesses the competence or performance of social work students in field education. Through Title IV-E training grants in child welfare, schools of social work have designed competency-based curriculum and assessment tools to measure individual students’ and practitioners’ competency. A review of the literature, however, found neither descriptions of the tools nor reports of reliability and validity testing for the tools (Birmingham, Berry, & Bussey, 1996; Breitenstein, Rycus, Sites, & Jones Kelley, 1997; Fox, Burnham, & Miller, 1997; Reilly & Petersen, 1997). The absence of reliable and valid outcome measures undermines confidence in the findings from research on social work field education.

Social work educators have not yet included measures of client outcomes in their evaluation of student learning in field education. Ultimately, the most telling indicator of students’ and practitioners’ competence is the results they achieve with their clients. Although pragmatic issues, such as obtaining ethics approval from numerous field settings and gaining informed consent from clients and students, are a challenge, it is crucial to include service quality and outcomes as measures of the effectiveness of social work education (Gambrill, 2001).
In summary, despite agreement on the importance of empirically-valid methods for evaluating the competence of social work graduates, there is still little evidence of progress in the development and use of such tools. This paper critically reviews tools for measuring student competence that are reported in the literature and describes one school’s exploration of the psychometric properties of the scale that is currently used to evaluate the competency of its students.

Tools for Evaluating Competence

For many decades professions have been concerned with ensuring their practitioners’ competency. Researchers have observed that notions of competence and competency-based training do not emanate from a theoretical framework, but rather they reflect an approach to accountability to the public for the practice of professionals (Goldhammer & Weitzel, 1981; Hackett, 2001). Competency has been defined as a description of the essential skills, knowledge, and attitudes required for effective performance in a work situation (Rylatt & Lohan, 1995). The competence of a professional rests in the adequate possession of these attributes to meet a standard that enables effective or successful performance (Hackett, 2001). Competency-based education (CBE) was introduced into social work education in an attempt to demonstrate that educational programs were preparing practitioners who were not only able to master classroom material but were also able to practice (Arkava & Brennan, 1976). In CBE, educational outcomes for beginning social work practitioners are expressed in behavioral terms and measured by using indicators to identify varying levels of performance. When field programs adopt this approach, they define competencies as learning objectives in field education, determine what type of evidence will be used to assess the degree to which students have mastered those skills, and identify methods to assess that evidence (Brennan, 1982). Initially the approach appeared to have great promise for field education, if clearly-defined educational outcomes and performance indicators could replace the vague and nebulous field objectives used at that time (Clark & Arkava, 1979; Gross, 1981; Larsen & Hepworth, 1980). However, the failure to assess the reliability and validity of these tools and methods used to evaluate student competence has taken away from the educational value of this approach to social work field education.

A review of the social work research literature to 2001 revealed three scales that have been developed and evaluated for measuring practice competence. Each of these scales relies on the concept of competency-based education, in which educational goals and skills to be mastered are identified in clear, measurable terms (Arkava & Brennan, 1976; Larsen, 1978). Programs specify the practice data that students are expected to produce and specify the methods for evaluation. Successful outcomes are determined by the degree to which students have achieved those goals or are applying the expected skills.

O’Hare et al. (1998) developed the Practice Skills Inventory (PSI) as a self-report or clinician-report measure that incorporates multiple indicators of social work practice ability. The PSI assesses how frequently practitioners use these identified skills. It appears that the researchers equate frequency of use with quality—that is, the more often the practitioner uses a skill the better the quality of service offered to the client. The researchers reviewed outcome research from psychotherapy and social work practice that identified common therapeutic factors across practice models that produce optimal client outcomes. These factors were used to generate 75 initial skill items for the PSI. Using factor analysis, the initial pool was reduced to 23 items loading on four factors that together accounted for 60.6% of the variance in practitioners’ scores. These resulting four factors were identified as supportive skills, therapeutic skills, case-management skills, and insight-facilitation skills. Reported reliability coefficients (Cronbach’s alpha) range from .80–.86. Construct validity for the PSI was demonstrated through comparison of the factor structure obtained from a sample of 285 social work students and from another sample of 281 experienced social work practitioners. The initial factor analysis with the student sample yielded four factors, three of which were replicated in the subsequent factor analysis with the MSW practitioners.

A considerable strength of the PSI is the inclusion of indicators and sub-scales that measure multiple dimensions of social work practice that are not limited to one domain of practice ability, such as interviewing skills. Another strength of the PSI is that it includes skills drawn from the research literature on process as well as outcomes. A limitation of the scale is the focus on the frequency with which students reported that they attempt to use these skills, rather than on students’ competent use of the skills. Further, the PSI measure is a self-report method for assessing personal performance and has not been tested as a measure for instructors to use in evaluating students.

Wilson (1981), in collaboration with field instructors, designed a checklist of specific characteristics to distinguish between beginning and advanced students. Vourlekis et al. (1996) tested the reliability and the content, concurrent and predictive validity of a sub-set of the checklist. The aim of the Wilson checklist is to assess social work students’ performance through the evaluation of their process recordings. Items on the checklist address 26 specific interviewing skills, which are rated on a 5-point Likert scale that ranges from beginning to advanced skill levels. The sample consisted of 57 pairs of BSW, MSW Year 1, MSW Year 2 students, and their field instructors from three schools of social work. In this study, overall reliability coefficient tests (Cronbach’s alpha) for the checklist are high (r=.96). The majority of field instructors agreed or strongly agreed that the checklist reflects meaningful dimensions of interviewing skills, thus demonstrating content validity. The researchers also show that the checklist had concurrent validity in that mean scores for the more advanced group of MSW students were higher than the mean scores for the beginning group of BSW students; predictive validity was shown in that students who scored high on the checklist also scored high on the final field evaluation. ANOVA and correlational analysis determined that student scores on the checklist were unrelated to factors such as students’ race, gender, practice setting, age, and previous work or volunteer experience. Although the authors report this to be a strength of the measure, indicating that the measure was bias free, it is also a potential limitation. Other research has suggested that factors such as age (Cunningham, 1982; Pelech, Stalker, Regehr, & Jacobs, 1999), gender (Pfouts & Henley, 1977; Pelech et al., 1999), previous experience (Pfouts & Henley, 1977; Pelech et al., 1999), and entering grade point average (GPA) (Bogo & Davin, 1989; Cunningham, 1982; Pfouts & Henley, 1977; Pelech et al., 1999) are associated with performance in the field. The Wilson checklist provides an improvement over the PSI for measuring student competence because it measures the quality of social work students’ acquisition of core interviewing skills, as reflected in their process notes, based on a scaling technique that distinguishes between beginning and advanced interviewing skill levels. The checklist also provides some opportunity for raters to measure students’ critical thinking and ability to analyze their interactions with clients, because students were asked to critically reflect on and analyze their own skills and behaviors in their process recordings. However, the focus on interviewing skills ignores other dimensions of social work practice, such as relationship or alliance skills, which accounted for over 30% of the variance in the O’Hare et al. (1998) study. The use of self-reported process recordings to provide evidence of students’ practice is another difficulty with the testing of the checklist. Process recordings provide an indirect measure compared to directly observing the students’ practice through one-way mirrors or videotapes of interviews. Raters, who use the checklist to evaluate students, may actually be assessing students’ abilities to write process recordings rather than their interviewing skills.

Koroloff and Rhyne (1989) developed a 25-item rating scale to assess social work students’ acquisition of interviewing skills. These skills are organized into four sub-scales: interpersonal communication (10 items); assessment skills (6 items); intervention skills (6 items); and termination skills (3 items). The sub-scales are rated on a 5-point Likert scale that evaluates how well students have integrated these skills into their practice with clients. Individual indicators and sub-scales were generated by faculty at the researchers’ university. Both students and field instructors independently rated the students’ performance. Data collected for the study revealed that the rating scale was sensitive enough to detect differences (a) between students’ skill levels prior to and at the end of their field education and (b) between the skill levels of experienced and non-experienced students. However, the authors did not attempt to determine if this list of interviewing skills can consistently and validly represent student competence in field practice. Again, the exclusive focus on interviewing skills does not reflect the multidimensional nature of social work practice.

Attempts to develop measures of student competence in social work field education are limited by a number of factors. First, it is difficult to identify core social work practice skills and learning objectives that go beyond basic micro-level interviewing skills and that incorporate multiple dimensions to reflect the critical thinking and assessment aspects of social work practice (Csiernik, Vitali, & Gordon, 2000; Dore, Morrison, Epstein, & Herrerias, 1992). A second difficulty in developing evaluation tools is the need for scaling techniques that measure the quality of students’ skill performance. Evaluation tools need to provide adequate criteria against which raters can measure student mastery. Finally, further work is needed to determine what data is necessary to provide evidence of the students’ performance, whether through process recordings, direct observation, or subjective measures.


The Faculty of Social Work at the University of Toronto requires students in the two-year Master of Social Work program to attend two field settings, one in the first year of study and another in the final year. In each setting students are evaluated on a competency-based form that is designed to rate social work students’ performance or competence in multiple dimensions of social work practice, including professional skills (i.e., demonstrate congruence between one’s activities and professional values and ethics); assessment skills (i.e., organize and present data and well-written assessments); intervention skills (i.e., uses a range of techniques and roles to achieve planned outcomes); evaluation skills (i.e., assesses one’s own level of competence and effectiveness in practice); and communication skills (i.e., clarify initial information about the client’s concerns and problems). Knowledge of the organization and community resources are also part of the evaluation form but were not used in this study as they rate student comprehension rather than behavior. The dimensions of social work practice and the specific indicators were generated by faculty members and experienced social work practitioners who act as field instructors for the school (a copy of this evaluation form can be obtained from the first author).

On this evaluation form, field instructors rate student performance for each competency indicator on a 5-point Likert scale that identifies stages in skill acquisition (from understanding to behavioral integration). This scale has been used at the University of Toronto since 1979 for the purpose of evaluating the competency of social work students in the Master of Social Work program. Revisions to the form have been undertaken to reflect changes in practice. Evaluations of the students are based on practice over a period of approximately 20 weeks and include weekly audio or videotapes of practice with one client, weekly process and summary records with one to two clients, and observation of students in weekly or monthly team meetings. New field instructors, as part of their training, are oriented to the evaluation form and methods of assessing student performance. Nevertheless, this method of evaluation has raised concerns about the reliability and validity of its measurements and has led to the present inquiry.

Data for this study consisted of ratings on Year 1 and Year 2 final field evaluation forms, students’ age, gender, GPA entering and exiting the program, and year entering the program. These data were obtained from the files of students in the two-year Master of Social Work program and were entered into a computer database using a standard data-entry program and were analyzed using the Statistical Package for the Social Sciences (Norusis, 1990) by the research assistant, a doctoral student. An additional variable was a yes/no rating of whether the student was identified as having difficulty in the field setting. This variable was established by presenting the list of students included in the study and asking the Director of Field Education and her associates to identify any students who were at risk of failing. Using this method, 18 students were identified as having difficulty.

Files from a population of 480 students who entered the two-year program from 1992–1998 were included for analysis. Student files were excluded if one of their two field settings was not at the micro level or if the files had missing information. The sample included 300 students, 83.00% (n=249) of whom were women, with a mean age of 34 years and a range of 26 to 62 years. Exiting GPAs ranged between 3.12 and 4.13 with a mean of 3.73; entering GPAs had a mean of 3.67 and ranged from 2.47 to 4.30. Due to missing data and list-wise deletion of cases during factor analysis, the final sample size was reduced to 227 for the Year 1 evaluation forms and 253 for the Year 2 evaluation forms.

Factor Analysis

The evaluation form contains multiple skill indicators (referred to in this analysis as items) for each competency dimension, totaling 80 skill indicators or items. Frequencies for the items indicate that there is little variability across items which could prove problematic for the factor analysis as this results in items that do not group together. However, inspection of the correlation matrices for both first- and second-year scores indicates that most items are correlated with one another at r>0.30, suggesting that a latent variable or variables (represented in this analysis by factors) underlie these data (Tabachnick & Fidell, 2001). The Bartlett’s sphericity test suggests that the proportion of variance shared by the items is appropriate for factor analysis. The Kaiser-Meyer-Olkin measure of sampling adequacy, based on partial correlations among the items, is also high (r=0.96 for the first year, r=0.94 for the second year), further indicating that communality among the items is high and appropriate for factor analysis (Norman & Streiner, 1998).

Principal components analysis was used to extract the factors and the Cattell’s Scree Test was used to determine how many factors to retain. As most of the items loaded on the first factor, varimax rotation was conducted to maximize the variance explained by each factor (Norman & Streiner, 1998). An orthogonal rotation yielded the best results. Items were considered significantly loaded onto a factor if r>0.50. A final check of the factor loadings consisted of verifying that items loading onto the same factors were correlated with one another by checking the initial correlation matrix obtained during factor extraction (Norman & Streiner, 1998). The factors were interpreted based on the theoretical commonality between the items that loaded onto each factor.


Eighty skill indicators or items were analyzed using principal components analysis and the varimax rotation method for both the first- and second-year (Year 1 and Year 2 respectively) field education evaluation forms. For the Year 1 evaluation form, this initial pool of items was reduced to 52 representing eight factors that together accounted for 67.69% of the variance as demonstrated in Table 1. Internal reliability coefficients for these factors ranged from 0.84 to 0.97.

TABLE 1. Accounting of Total Variance by Factors for the Year 1 Evaluation Form

% of
Cumulative % of variance
Intervention Planning and Implementation
Differential Use of Self
Empathy and Alliance
Values and Ethics
Presentation Skills
Report Writing: Adhering to Guidelines
Report Writing: Quality

For the Year 2 evaluation forms, the 80 items were reduced to 50, representing seven factors which in total account for 52.48% of the variance as demonstrated in Table 2. Internal reliability coefficients for these factors ranged from 0.80 to 0.94. Building on previous studies that connect performance in field education with a variety of student factors, associations between total scores on the field evaluation form and selected student factors were assessed.

Correlation coefficients revealed that age was not significantly associated with Year 1 or Year 2 field education evaluation scores. However, entering GPA was mildly though significantly associated with Year 2 total scores (r=0.11, p<0.05) and exiting GPA was also mildly but significantly associated with both Year 1 total scores (r=0.14, p<0.05) and Year 2 total scores (r=0.13, p<0.05). These results suggest that students with higher GPAs also tended to have higher total scores on the field education evaluation forms. Although the correlations are mild in each case, the consistent pattern does suggest that they may have some practical significance.

TABLE 2. Accounting of Total Variance by Factors for the Year 2 Evaluation Form

% of
Cumulative % of variance
Differential Use of Self
Intervention Planning and Implementation
Presentation Skills
Empathy and Alliance
Values and Ethics
Report Writing

An independent sample t test was conducted to evaluate the difference in female and male students’ total scores on the evaluation forms. The t test results show that there were no significant differences in female and male students’ scores on the Year 1 or Year 2 evaluations. There was, however, a significant difference in Year 1 total scores between those students identified as having difficulty in field education and those who were not (t=2.15, p<0.05). This same difference was not significant for Year 2 students.


Reliability of the measure was assessed in this study both in terms of internal consistency of items within each sub-scale and in terms of comparing the evaluations of the same students in Year 1 and Year 2 of the program. The factors in the analysis of the first-year students had Cronbach’s alphas ranging from 0.84 to 0.97, while the factors in the analysis of the second-year students had Cronbach’s alphas ranging from 0.80 to 0.94. This suggests that there is very high internal consistency of the items within each sub-scale (Janda, 1998).
Students are rated by different field instructors in Year 1 and in Year 2, which provides two scores for each student. These two scores provided data to measure the test–retest and inter-rater factors. The correlation between Year 1 and Year 2 scores, although significant, is only r=0.12 (p=0.05). This is unacceptably low (Janda, 1998). It would be expected that this low level of correlation is in part due to the fact that some students would have improved their skills to a greater extent than other students. In addition, different field instructors may have used the scale in a different manner, thereby obscuring true differences between students. However, the majority of the variance is undoubtedly due to the fact that scores are consistently higher for Year 2 students, because a score of 4 or 5 on the majority of the Likert items is expected for graduation. It may be that field instructors are reluctant to give lower scores that would put a student’s graduation at risk. At this point, therefore, it is not possible to assess whether there is a greater problem with the inter-rater reliability or the test–re-test reliability. However, we suggest there may be problems with both.

Validity Criteria

Content Validity. The Faculty of Social Work at the University of Toronto developed this instrument based on the competency model of education (Arkava & Brennan, 1976; Clark & Arkava, 1979; Larsen, 1978). Items on this scale were originally selected through a consultative process between faculty members in social work and experienced field practice educators. The items were based in part on the practice wisdom of these two groups of individuals and in part on a review of factors identified in the practice literature. The instrument taps several dimensions of clinical social work practice, including assessment and intervention skills, relationship-building skills, ethics and values, report writing, and presentation skills. The instrument has now been used for 22 years, with minor revisions for clarity. In 1990 a major revision was undertaken to incorporate cultural competence and practice with diverse client groups. Wherever possible, existing competencies were expanded and new competencies added. These modifications, which incorporated existing practice outcome research, were undertaken with a committee of field instructors. A further indicator of content validity is the comparison of the factor structure of items in this instrument with previously tested instruments. Each of the factors identified by Koroloff and Rhyne (1989), O’Hare and Collins (1997), O’Hare et al. (1998), and Vourlekis and colleagues (1996) are included in the present measure with the exception of termination skills (Koroloff & Rhyne, 1989). However, this current measure also expands the range of competencies found in the above literature to include presentation skills, ethics and values, and report writing.

Construct Validity. Construct validity was assessed by comparing the factor analysis of this scale on the same group of students in both their first- and second-year field settings. Ratings on each student have therefore been assessed independently by different field instructors. This resulted in some support for the construct validity of the measure. Although there was a similar factor breakdown for both years (with the exception that report writing was two separate factors in Year 1 and only one factor in Year 2), the factors accounted for different proportions of variance in each year (52.48% in Year 2 vs. 67.69% in Year 1). This difference is in part due to the fact that there is much less variation in the scores in Year 2 than in Year 1.

Criterion Validity. A variable was included in the analysis that indicated whether the student had been identified as having difficulty in the field. As mentioned above, this variable was established by asking the school’s Director of Field Education to identify students who were at risk of failing in the field, and 18 students in each year were identified. There was a significant difference in total scores on the field evaluation forms between Year 1 students who were identified as having difficulty and those who were not (t=2.15, p<0.05). As such, in the first year of the program the tool appears to correctly differentiate between students who possess an acceptable level of skills and those who do not. This difference was not significant in Year 2, again undoubtedly due to the lack of variability in Year 2 scores. Nevertheless, total scores in Year 2 were significantly associated with entering GPA scores, and total scores in both years were associated with exiting GPA scores. Field instructors completing the field practice evaluations have no access to grades in the academic component of the program. The association between GPA and field performance is consistent with other findings in the literature (Bogo & Davin, 1989; Cunningham, 1982; Pfouts & Henley, 1977; Pelech et al., 1999). Gender and age were not significantly associated with scores, which is contrary to some of the literature (Cunningham, 1982; Pfouts & Henley, 1977; Pelech et al., 1999). Although this may indicate validity problems, it may also indicate that scores are not biased by these variables, suggesting that the measure does have some concurrent validity.


The limitations of this study have been largely determined by the limitations of the field practice evaluation tool used at the University of Toronto. One important limitation of the tool is that levels of expected performance are provided for each year of the program in the field manual. In Year 1, students are expected to score in the range of 3 on the 5-point scale. Second-year students are expected to attain scores at the level of 4 or 5 on the scale. These expectations are likely the reason that the scale was unable to differentiate between students identified as having problems in Year 2 of the program and are likely the reason for the low test–retest reliability. As previously noted, field instructors may be reluctant to give lower scores that would put a student’s graduation at risk. Of interest was the finding that there was more variation in the scores of Year 1 students than in the scores of Year 2 students. Field instructors appeared to be less constrained by the school’s expectations when they knew that the student had one more year and another field education experience to continue developing practice competence. Nevertheless, an alternative method of providing comparisons must be established for this tool.

A second limitation of the tool is the fact that items are not randomly distributed throughout the form but are rather grouped by type of skill, for ease of evaluation. The loading of items onto factors, such as presentation skills, are highly consistent with the manner in which they appear on the scale, suggesting the possibility of a “halo effect.” However, other items, such as assessment and relationship skills, do not load onto the factors in the same manner in which they are grouped on the tool, demonstrating that the structure of the tool does not bias all the results.

An additional limitation when assessing a tool of this type is the process of evaluation itself. Using an adult-education model for field education, students are expected to be active, self-directed learners and to take responsibility for identifying their learning needs and progress. Students are encouraged to critically reflect on their practice, to use knowledge to guide their practice, and to participate actively in evaluation (Bogo & Vayda, 1998). Hence, the final ratings of the student on the evaluation form represent an assessment that the field instructor arrives at in consultation with the student. Several authors have suggested that field instructors are generally uncomfortable with the power and authority of their role and report that they do not like conducting evaluations (Gitterman & Gitterman, 1979; Kadushin, 1985; Pease, 1988). As a result, rather than directly confronting the student about an area of concern and assigning a low rating, the instructors may engage in a negotiation process with the student, thereby inflating the ratings and producing a “halo effect” (Pease, 1988). Despite these concerns, however, Reid and his colleagues (1996) were able to demonstrate the validity of field instructor evaluations as measured by an external criterion, the ratings of student audiotapes by independent judges.

Inaccurate and inflated ratings may be exacerbated in this study by the fact that the University of Toronto specifies the expected level of performance for each year. If students are consistently rated below the level of expectation, the faculty field liaison must become involved more directly in evaluating the student’s work to determine whether the student is at risk of failing, needs the field experience extended, or needs more intensive supervision. Field instructors may avoid giving lower ratings that would set this process in motion. Given the current context of agency practice, in which social workers report that they have little time to provide field education (Globerman & Bogo, 2002), it is understandable if they wish to avoid lengthy negotiations with students about ratings and intensive consultations with the faculty representative.


The impetus for this study was our concern that the development of an empirical base for field education is compromised by the absence of valid and reliable measures to differentiate between levels of student performance and mastery of competencies. If social work educators are unable to accurately measure differences in performance, how can we establish which factors produce maximal learning in the field? Closely connected to this issue is the academy’s accountability to the profession and the public in its claim that social work graduates possess a beginning level of practice competence. Despite the persistent concern in the social work literature regarding the lack of objective standardized outcome measures for assessing competence (Alperin, 1996; Pease, 1988; Wodarski, Feit, & Green, 1995), field instructors and faculty have always failed some students in the field. This indicates at least some collective sense of inappropriate and destructive behaviors and perhaps also some agreement about the minimum characteristics or competencies for practice (Hartman & Wills, 1991). We concluded that it was necessary to unravel the issues that converge when evaluating student field performance, and a logical first step was to investigate the psychometric properties of a scale used by one school to assess specific competencies of students.

The Faculty of Social Work at the University of Toronto developed a competency-based instrument to evaluate student performance in field education. This tool has now been used for 22 years with modifications made to reflect changing practice. Over time questions have been raised about the reliability and validity of the measure, but similar to other social work programs, the properties of the scale have never been evaluated. The large database of students whose skills have been assessed with this evaluation form provided an excellent opportunity to determine if the measure is effective in identifying variations in student ability. The factor analysis of the measure revealed a consistent factor structure between Year 1 and Year 2 of the program, which accounted for 67.69% and 52.48% of the variance in scores, respectively. The instrument had excellent internal consistency with Cronbach’s alphas ranging from .80–.97. However, there were inadequate correlations between ratings of individual students in their first- and second-year field education experiences, suggesting that students are not consistently rated by different field instructors. This is a clear limitation of the scale. Scores on the field evaluation form were significantly associated with grades in the academic component of the program, which is consistent with other research that equates GPA with field performance. In addition, the measure could differentiate between students identified independently as having difficulty in Year 1 of the program and those who were not. However, the measure could not differentiate students identified as having difficulty in Year 2.

Student evaluation in competency-based education requires that educational objectives are defined in specific, behavioral terms with indicators for judging varying levels of performance. Further, a determination must be made about the data needed to assess whether the student’s practice provides evidence of mastery of the objectives, and sound methods must be established to assess those data (Arkava & Brennan, 1976). Based on the current study, we conclude that there is some theoretical and practice coherence in the factors and the associated skills that comprise this evaluation tool. Indicators that field instructors use to rate the attainment of skills are clearly specified in the tool. Skills identified cover several domains including assessment and intervention skills, ability to establish and effectively use a therapeutic relationship, differential use of self, demonstration of social work values and ethics, and report writing and presentation skills. Data used to assess these skills come from a variety of sources including direct observation, tapes, and written work. In the end, these evaluations are consistent with academic ratings of students and are able to differentiate students experiencing difficulties in their first-year field experience, although the evaluations are unable to do so for the second year. Because this final-year evaluation is in fact the key to gatekeeping, more research needs to be undertaken focusing on final-year evaluation to strengthen evaluation in field education. As a first step, schools of social work may find it useful to assess the reliability and validity of the tools they use to evaluate the competency of their students. Continued development of reliable and valid evaluation tools that builds on the work already done and is tested on larger samples of students is recommended. In the end the acid test of student competency will be client outcome (Gambrill, 2001), and assessing the association between student evaluations and client outcomes is an important area for further research.

Also of importance is attention to the processes used to conduct the final evaluation. Even with the best evaluation measure, research is needed to better understand the aspects of the process that limit the effective use of the tool. Illuminating the issues that field instructors experience as challenges in their assessment role could lead to the development and testing of new methods of evaluation that might include specialized training and support for these instructors. It may also be timely to develop and test alternative evaluation processes that separate the teaching function and the evaluation function in the field instructor’s role. Social work educators are challenged to continue to address the complex problem of measuring outcomes, as it is fundamental to the further development of an empirical base for field education and to the strengthening of our performance as gatekeepers for the profession.


Alperin, D. E. (1996). Empirical research on student assessment in field education: What have we learned? The Clinical Supervisor, 14(1), 149–161.

Alperin, D. E. (1998). Factors related to student satisfaction with child welfare field placements. Journal of Social Work Education, 34, 43–54.

Arkava, M. L., & Brennan, E. C. (Eds.). (1976). Competency-based education for social work: Evaluation and curriculum issues. New York: Council on Social Work Education.

Baker, D. R., & Smith, S. L. (1987). A comparison of field faculty and field student perceptions of selected aspects of supervision. The Clinical Supervisor, 5(4), 31–42.

Birmingham, J., Berry, M., & Bussey, M. (1996). Certification for child protective services staff members: The Texas initiative. Child Welfare, 75, 727–740.

Bogo, M., & Davin, C. (1989). The use of admissions criteria and a practice skills measure in predicting academic and practicum performance of MSW students. Canadian Social Work Review, 6(1) 95–109.

Bogo, M., & Globerman, J. (1999). Inter-organizational relationships between schools of social work and field agencies: Testing a framework for analysis. Journal of Social Work Education, 35, 265–274.

Bogo, M., & Vayda, E. (1998). The practice of field instruction in social work: Theory and process (2nd ed.). New York: Columbia University Press.

Breitenstein, L., Rycus, J., Sites, E., & Jones Kelley, K. (1997). Pennsylvania’s comprehensive approach to training and education in public child welfare. Public Welfare, 55(2), 14–20.

Brennan, E. C. (1982). Evaluation of field teaching and learning. In B. W. Sheafor & L. E. Jenkins (Eds.), Quality field instruction in social work (pp. 76–97). New York: Longman.

Canadian Association of Schools of Social Work (CASSW). (2001). In critical demand: Social work in Canada. Ottawa, Canada: Author.

Clark, F., & Arkava, M. (1979). The pursuit of competence in social work. San Francisco: Jossey-Bass.

Csiernik, R., Vitali, S., & Gordon, K. (2000). Student and field instructor perceptions of a child welfare competency-based education and training project. Canadian Social Work, 2(2), 53–64.

Cunningham, M. (1982). Admissions variables and prediction of success in an undergraduate field work program. Journal of Education for Social Work, 18(2), 27–34.

Cuzzi, L., Holden, G., Rutter, S., Rosenberg, G., & Chernack, P. (1996). A pilot study of fieldwork rotations vs. year long placements for social work students in a public hospital. Social Work in Health Care, 24(1), 73–91.

Deal, K. H. (2000). The usefulness of developmental stage models for clinical social work students: An exploratory study. The Clinical Supervisor, 19(1), 1–19.

Dore, M. M., Morrison, M., Epstein, B. D., & Herrerias, C. (1992). Evaluating students’ micro practice field performance: Do universal learning objectives exist? Journal of Social Work Education, 28, 353–362.

Fortune, A. E. (1994). Field education. In F. J. Reamer (Ed.), The foundations of social work knowledge (pp. 151–194). New York: Columbia University Press.

Fortune, A. E., & Abramson, J. S. (1993). Predictors of satisfaction with field practicum among social work students. The Clinical Supervisor, 11(1), 95–110.

Fortune, A. E., McCarthy, M., & Abramson, J. S. (2001). Student learning processes in field education: Relationship of learning activities to quality of field instruction, satisfaction, and performance among MSW students. Journal of Social Work Education, 37, 111–124.

Fox, S. R., Burnham, D., & Miller, V. P. (1997). Reengineering the child welfare training and professional development system in Kentucky. Public Welfare, 55(2), 8–13.

Gambrill, E. (2000). Honest brokering of knowledge and ignorance. Journal of Social Work Education, 36, 387–397.

Gambrill, E. (2001). Evaluating the quality of social work education: Options galore [Editorial]. Journal of Social Work Education, 37, 418–429.

Gitterman, A. (1989). Field instruction in social work education: Issues, tasks and skills. The Clinical Supervisor, 7(4), 77–91.

Gitterman, A., & Gitterman, N. P. (1979). Social work student evaluation: Format and method. Journal of Education for Social Work, 15(3), 103–108.

Globerman, J., & Bogo, M. (2002). The impact of hospital restructuring on social work field education. Health and Social Work, 27(1), 7–16.

Goldstein, H. (2000). Social work at the millennium. Families in Society, 81(1), 3–10.

Goldhammer, K., & Weitzel, B. (1981). What is competency-based education? In R. Nickse (Ed.), Competency-based education: Beyond minimum competency testing (pp. 42–61). New York: Teachers College Press.

Gross, G. M. (1981). Instructional design: Bridge to competence. Journal of Education for Social Work, 17(3), 66–74.

Hackett, S. (2001). Educating for competency and reflective practice: Fostering a conjoint approach in education and training. Journal of Workplace Learning, 13(3), 103–112.

Hartman, C., & Wills, R. (1991). The gatekeeper role in social work: A survey. In D. Schneck, B. Grossman, & U. Glassman (Eds.), Field education in social work: Contemporary issues and trends (pp. 310–319). Dubuque, Iowa: Kendall/Hunt.

Janda, L. (1998). Psychological testing: Theory and practice. Toronto, Canada: Allyn & Bacon.

Kadushin, A. (1985). Supervision in social work (2nd ed.). New York: Columbia University Press.

Kadushin, A. E. (1991). Introduction. In D. Schneck, B. Grossman, & U. Glassman (Eds.), Field education in social work: Contemporary issues and trends (pp. 11–12). Dubuque, Iowa: Kendall/Hunt.

Kilpatrick, A. C., Turner, J., & Holland, T. P. (1994). Quality control in field education: Monitoring students’ performance. Journal of Teaching in Social Work, 9(1/2), 107–120.

Knight, C. (1996). A study of MSW and BSW students’ perceptions of their field instructors. Journal of Social Work Education, 32, 399–414.

Knight, C. (2001). The process of field instruction: BSW and MSW students’ views of effective field supervision. Journal of Social Work Education, 37, 357–379.

Koerin, B., & Miller, J. (1995). Gatekeeping policies: Terminating students for nonacademic reasons. Journal of Social Work Education, 31, 247–260.

Koroloff, N. M., & Rhyne, C. (1989). Assessing student performance in field instruction. Journal of Teaching in Social Work, 3(2), 3–16.

Larsen, J. (1978). Competency-based and task-centered practicum instruction. Journal of Social Work Education, 16(1), 87–95.

Larsen, J., & Hepworth, D. (1980). Enhancing the effectiveness of practicum instruction: An empirical study. Journal of Education for Social Work, 18(2), 50–58.

Lazar, A., & Mosek, A. (1993). The influence of the field instructor-student relationship on evaluation of students’ practice. The Clinical Supervisor, 11(1), 110–120.

Norman, G. R., & Streiner, D. L. (1998). Biostatistics: The bare essentials. Hamilton, Canada: Becker.

Norusis, M. J. (1990). SPSS statistics manual. Chicago: Statistical Package for the Social Sciences.

O’Hare, T., & Collins, P. (1997). Development and validation of a scale for measuring social work practice skills. Research on Social Work Practice, 7, 228–238.

O’Hare, T., Collins, P., & Walsh, T. (1998). Validation of the practice skills inventory with experienced clinical social workers. Research on Social Work Practice, 8, 552–563.

Pease, B. (1988). The ABCs of social work student evaluation. Journal of Teaching in Social Work, 2(2), 35–50.

Pelech, W., Stalker, C., Regehr, C., & Jacobs, M. (1999). Making the grade: The quest for validity in admissions decisions. Journal of Social Work Education, 35, 215–226.

Pfouts, J., & Henley, C. (1977). Admissions roulette: Predictive factors for success in practice. Journal of Education for Social Work, 13, 56–62.

Raskin, M. (1994). The delphi study in field instruction revisited: Expert consensus on issues and research priorities. Journal of Social Work Education, 30, 75–88.

Reid, W. J., Bailey-Dempsey, C., & Viggiana, P. (1996). Evaluating student field education: An empirical study. Journal of Social Work Education, 32, 45–52.

Reilly, T., & Petersen, N. (1997). Nevada’s university–state partnership: A comprehensive alliance for improved services to children and families. Public Welfare, 55(2), 21–28.

Rylatt, A., & Lohan, K. (1995). Creating training miracles. Sydney, Australia: Prentice-Hall.

Schneck, D. (1995). The promise of field education in social work. In G. Rogers (Ed.), Social work field education: View and visions (pp. 3–14). Dubuque, IA: Kendall/Hunt.

Spitzer, W., Holden, G., Cuzzi, L., Rutter, S., Chernack, P., & Rosenberg, G. (2001). Edith Abbott was right: Designing fieldwork experiences for contemporary health care practice. Journal of Social Work Education, 37, 79–90.

Tabachnick, B., & Fidell, L. (2001) Using multivariate statistics. Needham Heights, MA: Allyn & Bacon.

Tolson, E. R., & Kopp, J. (1988). The practicum: Clients, problems, interventions and influences on student practice. Journal of Social Work Education, 24, 123–134.

Vourlekis, B., Bembry, J., Hall, G., & Rosenblum, P. (1996). Testing the reliability and validity of an interviewing skills evaluation tool for use in practicum. Research on Social Work Practice, 6, 492–503.

Wilson, B. (1981). Field instruction. New York: Free Press.

Wodarski, J. S., Feit, M. D., & Green, R. K. (1995). Graduate social work education: A review of two decades of empirical research and considerations for the future. Social Service Review, 69(1), 108–130.

Accepted: 5/02.

Marion Bogo is professor and Sandra Rotman Chair in Social Work, Faculty of Social Work, University of Toronto. Cheryl Regehr is associate professor, Faculty of Social Work, University of Toronto. Judy Hughes is a doctoral candidate, Faculty of Social Work, University of Toronto. Roxanne Power is a senior lecturer, Faculty of Social Work, University of Toronto. Judith Globerman is professional practice leader, Allied Health, Vancouver Coastal Health Authority, Vancouver, British Columbia.

This study was funded by a grant from the Social Sciences and Humanities Research Council of Canada. The authors thank Joanne Daciuk, Research Coordinator, Centre for Applied Social Research, Faculty of Social Work, for assistance with data analysis.

Address correspondence to Marion Bogo, Faculty of Social Work, University of Toronto, 246 Bloor Street West, Toronto, Ontario, Canada, M5S 1A1; email:

© by the Council on Social Work Education, Inc. All rights reserved.


Back to Section Start to JSWE Online Start

CSWE Home | About CSWE | Membership | Accreditation | Member Program Directory | Annual Program Meeting | Programs & Services | Projects | Publications | Links | Login | Search