Peer Review

Our Primitive Art of Measurement

Every week brings news of another state legislature enacting ‘reforms’ of K–12 education, based on students’ test scores. Standardized test scores are being used to evaluate, compare, and fail teachers and schools. Will the “press to assess with a test” be focused next on postsecondary education? Will test scores be used to compare colleges and universities? Will public criticism, loss of public funds, and loss of students follow for some institutions? Will faculty feel pressure to raise students’ test scores, perhaps narrowing the curriculum to focus on the tested skills?

Certainly the 2006 report of the Commission on the Future of Higher Education, A Test of Leadership: Charting the Future of U.S. Higher Education, prompted moves in this direction with comments about the need for a simple way to compare institutions and public reporting of the results of learning assessments, including value-added measures. Today some one thousand colleges and universities are using one of three standardized measures of generic skills like writing and critical thinking to test first-year students and seniors; now value added can be measured, reported publicly, and compared among institutions.

Unfortunately, very little is being written about what these tests of generic skills are actually measuring and with what accuracy. Virtually nothing is coming out about the validity of the value-added measure. We do know that the institution-level correlation between students’ scores on the tests of generic skills and their entering SAT/ACT scores is so high that prior learning accounts for at least two thirds of the variance in institutions’ scores. Out of that other one third, we must subtract the effects of age, gender, socioeconomic status, race/ethnicity, college major, sampling error, measurement error, test anxiety, and students’ motivation to perform conscientiously before we can examine the effects on learning of attending a particular college.

Institutional comparisons inevitably will be made on the basis of the negligible (even 1–2 percent) amount of variance that can be attributed to the contribution of any given college to students’ scores on these standardized tests of generic skills. We must argue for multiple measures of institutional effectiveness. Instead of relying on one assessment tool with serious limitations, we also must argue for the use of measures of learning that will provide specific guidance for improving curriculum and instruction.

Authentic assessment—using actual student work products that reveal their responses to the learning opportunities they are experiencing—is the best type of measurement for suggesting directions for improvement. For a decade I have argued that student electronic portfolios evaluated with rubrics provide our best hope for assessing what students actually know and can do (validity). Portfolios can provide dramatic and convincing evidence of learning over time (value added) and there is evidence that multiple evaluators can achieve levels of agreement (reliability) exceeding 0.90.

We must continue to develop AAC&U’s VALUE rubrics and others that measure other outcomes to increase their reliability (through extensive training for raters) and to demonstrate their worth—or validity—in improving both individual students’ learning and the effectiveness of academic programs. But the very same concerns about using standardized test scores to compare institutions also apply to these authentic measures when they focus on generic skills. The current state of the art of measurement apparently is just too primitive to enable us to construct instruments that are valid for comparing the effectiveness of institutions in increasing student learning of generic skills. After all, the venerable SAT has undergone more than eighty years of continuous development, yet questions about its validity continue to exist. Measurement scholars, please don’t make us wait another eighty years for the ability to demonstrate convincingly to ourselves and to the public the variety and complexity of the student learning that is occurring in colleges and universities!

Trudy Banta is a professor of higher education and senior advisor to the chancellor at Indiana University–Purdue University Indianapolis.

Previous Issues