Peer Review

Using VALUE Rubrics to Evaluate Collaborative Course Design

In recent years, the University of Kansas has made it a priority to improve the written communication and critical thinking skills of our undergraduate students. We especially have embraced the challenge of enhancing these skills in courses with larger enrollments—those with sixty to several hundred students. Although most faculty members presume that high-end student writing and critical analysis can only be properly taught or evaluated when class size is below thirty or forty students, a group of University of Kansas colleagues has redesigned large-enrollment courses around a cognitive apprenticeship model to specifically target these skills. Faculty members working with colleagues in the writing center and libraries developed and implemented staged assignments that allow students to acquire writing and critical thinking skills through multiple iterations with individual feedback.

In early versions of the project, five faculty members worked with a subject librarian and a writing consultant to design sequenced assignments that would develop students’ skills in written communication, critical reading, and synthesis of research literature. Later, through a project funded by the Spencer and Teagle Foundations, we expanded to ten additional courses and we developed a graduate student fellowship program in which teaching assistants were prepared as consultants by specialists in the library and in the writing center. We provided supplemental financial support to graduate student fellows so they could give extra time and skill to the courses they are assisting.

We wanted to know whether collaborative course design was worth the investment of time and resources needed to scale up beyond a pilot program. To answer this question, we used rubrics developed by the Association of American Colleges and Universities (AAC&U) to evaluate the written communication and critical thinking skills of students in these team-designed courses. The VALUE rubrics supplemented existing instructor evaluations of successful learning (i.e., grades on course-specific rubrics), allowing us to measure skills that are neither course nor discipline specific. We also liked the well-articulated goals and benchmarks of performance of the VALUE rubrics because they connect our expectations to those of a community of faculty colleagues from around the country. As a third measure, we had samples of students take the Collegiate Learning Assessment (CLA) to evaluate our course design model and to triangulate among three estimates of student learning (grades, VALUE rubrics, and CLA).


To date, we have used the VALUE rubrics on written communication and critical thinking to score the written assignments of about one hundred students from the first year of our three-year team-design project. We began by gathering random samples of assignments from the two team-designed courses from the project’s first year (political science and psychology) and two traditional courses of similar size and curricular level in the same disciplines. We convened a three-hour session of four graduate students (the raters) and a few faculty members to iteratively rate and discuss five or six assignments and make minor adjustments to the rubric language until they came to a shared understanding of the rubric categories and criteria. Each assignment was then independently scored on both the written communication and critical thinking rubrics by two different raters. The raters were quite reliable with each other, providing scores that were identical or one category apart at least 90 percent of the time. At the end of this process, the raters met again to discuss scoring disagreements and were permitted, but not compelled, to change their ratings following the discussion.




                           Once the scoring was complete, our first step was to look for overlap across the individual dimensions of the rubrics as a possible way of simplifying our inspection and representation of student performance. Scores on all five dimensions of the written communication rubric were strongly associated with one another so that students who scored high on one dimension generally scored high on all dimensions. Therefore, we pooled the scores across all of the written communication dimensions in our evaluation of students’ skills. In contrast, the critical thinking scores clustered together in two groups: the explanation of issues, student’s position, and conclusions/related outcomes dimensions were highly correlated with one another, and the evidence and context/assumptions dimensions were highly correlated, but the two groups were only weakly related to each other. For this reason, we aggregated the critical thinking scores into two sets, one that we refer to as “Issues, Analysis and Conclusions,” and the other that we call “Evaluation of Sources and Evidence.”

We represented students’ skill attainment by looking at the proportion of ratings that fell into each performance category (i.e., not met, benchmark, milestone 1, milestone 2, and capstone) for students in the team-designed and traditional courses. Figure 1 presents the distributions of scores, generated by all raters, pooled across all five written communication dimensions. This convention allows us to see clearly and quickly the variability of skill among our students and also reveals meaningful differences between the two types of courses. Very few of the scores are at the bottom or top ends of the distribution, but the distribution is shifted to the right (i.e., higher scores) for students in the team-designed courses—and this shift is statistically significant.

We constructed two sets of distributions of critical thinking ratings (see the top chart in fig. 2) representing the Issues, Analysis, and Conclusions scores and the other representing the Evaluation of Sources and Evidence scores (see the bottom chart in fig. 2). One clear take-away point is that regardless of course type, almost none of our students are reaching the highest level in either set of critical thinking skills. Nevertheless, the Issues, Analysis, and Conclusions ratings show a significant advantage for students in the team-designed courses. In the traditional courses, 20 percent of the ratings were below the minimal level of skill attainment. In the redesigned courses, virtually all of the ratings showed benchmark or higher skill attainment, and there were many more scores in the milestone 1 and 2 categories than in the traditional courses.





                           Compared to the other skills we evaluated, a high proportion of the Evaluation of Sources and Evidence ratings for all classes were below benchmark, suggesting a weakness in our students’ skill sets. There was also a surprising pattern: more than half of the ratings of student work in the redesigned courses did not meet the benchmark criterion, compared to about one-third in the traditional courses. We noted that the assignment instructions in both of the traditional courses called on students to evaluate their sources and evidence, whereas in the team-designed courses these skills were taught and already evaluated in an earlier stage of the assignment sequence. It appears that some skills will not be visible in student work unless the assignment explicitly calls upon students to use them. That difference in assignment requirements may also explain why our sample showed two unrelated subsets of dimensions within critical thinking.

We were also interested in how well our evaluations of student work via the VALUE rubrics corresponded with the two other measures of written communication and critical thinking in our project (i.e., grades on course-specific rubrics and CLA scores). Interestingly, the patterns that were visible in the VALUE rubric scores were not mirrored in the CLA scores; students in the team-designed and traditional courses performed no differently on the CLA. Students’ performance on the CLA, moreover, was generally unrelated to the VALUE rubric ratings of their coursework, as well as the instructors’ grading of the same coursework. In contrast, the latter two measures were highly correlated, suggesting that the VALUE rubrics capture qualities of critical thinking and writing that fit well with what faculty members value in their students’ work.


We represent the results of our student learning assessment as distributions of performance across categories, not as arithmetic averages of numerical values assigned to levels of skill. These graphs display the variability among our students and show the magnitude of the challenges we face in increasing student achievement. Although averages would illustrate the same changes associated with our course design intervention, the means mask much valuable information about the actual levels of skill that students are exhibiting. We suggest that graphical distributions of VALUE rubric scores are also an excellent way to track student growth across the first four years of their education. To illustrate, in figure 3 we present hypothetical distributions of VALUE rubric scores for first-year and fourth-year students in the same graph. Such distributions could represent either cross-sectional comparisons of samples of first- and fourth-year students gathered in the same year, or longitudinal comparisons that represent distributions from a large sample of the same students as they change over their college careers. This is a very transparent and direct form of evidence for any conversation about the value added by higher education. It speaks to the most important form of institutional evaluation: how varied are our students and how are they transformed by their time with us? Institutional self-assessment can be exceptionally well-served by careful tracking of performance distributions on the VALUE rubrics.



                           Based on the VALUE rubric data and changes in course-specific measures of student skill, we will continue to expand the number of courses that take advantage of collaborative course design. It is essential that our analysis includes an independent reading that identifies the level of intellectual work based on criteria that come from outside our university. When we used the AAC&U category definitions, which were generated by two years of national conversation on the rubrics, we confirmed what our local grades suggested about the benefits of collaborative course design. We also learned that relatively few of the students in this initial set of four courses appeared to meet the definitions for the best two rating categories on several dimensions of their work. Because we had expectations and goals derived by a consensus among members of a higher education organization, we were able to learn how well our students met those broader expectations of intellectual work. We look forward to a time when comparable institutions would share their distributions on these rubrics to provide useful benchmarks of performance.

Our analysis of the data from the VALUE rubrics affirmed that a team approach to course design can improve, some forms of student writing and thinking more than traditional solo design. We also demonstrated that graduate teaching assistants could be readily prepared to serve as informed consultants on assignment design, so we have a model for scaling up the entire enterprise to wider range of large enrollment courses, especially if they have teaching assistants. Success with writing center and library consultants suggests that we can expand to include instructional technology and student support specialists as options in building a course design team.

We also saw that the rubrics work best when there is close alignment between the nature of the assignment and the dimensions of intellectual skill described in the rubric. One of our surprise results is probably attributable to differences in how assignments were framed for students. One course presumed a level of skill in source evaluation, perhaps acquired from earlier courses, and students were expected to demonstrate it in their final assignment. Another course explicitly taught that skill in the early stage assignments, and the final assignment did not call for students to demonstrate it. In order for an independent reading to be effective as institutional feedback, we need to sample student work carefully, being sure that the assignment instructions are well aligned with the goals used by readers.

Finally, at a practical level, we are very encouraged that this process is manageable and sustainable. Faculty members who have used the VALUE rubrics to rate student work typically find the dimensions and category definitions sensible and meaningfully related to their existing views of educational goals. This affirms the lengthy process AAC&U followed in developing and refining these rubrics. We also believe the development of graduate students as partners in the enterprise is a useful model for campuses that have both teaching assistants and large populations of students. We see no reason that this form of evaluation should be limited only to smaller campuses with missions more focused on teaching. The team-designed model of teaching and evaluating teaching provides a good framework, and it fits well with the development and support of graduate students in those institutions with doctoral programs. It is an additional plus that those future college teachers will have the advantage of having participated in a very thoughtful exercise in instructional design and evaluation.

Andrea Greenhoot is an associate professor of psychology and faculty fellow at the Center for Teaching Excellence at the University of Kansas; Dan Bernstein is a professor of psychology and director of the Center for Teaching Excellence at the University of Kansas.

Previous Issues