Peer Review

"Going Naked"

Education--from preschool through college--is the primary means of improving human capital and is therefore understood to be the single most important factor in the ability of America to compete in the global economy. But there is a growing unease about what now passes for higher education--a vocal concern led not by angry students, as in the sixties, but by parents and business, political, and academic leaders who sense a dangerous hollowing of an increasingly precarious ivory tower.

Virtually every study within and outside the academy acknowledges that that we need to significantly improve our undergraduate colleges, not only to compete globally, but also to enrich an active democracy here at home, a public life marked by liberty, dissent, and robust civic engagement. The critics, in essence, have declared, "The academy has no clothes!"

The Spellings Commission

Joining the critics and jumping into the vacuum created by higher education leaders perceived to be unwilling to take on the necessary reform agenda to substantially improve quality, Secretary Spellings's Commission on the Future of Higher Education identified accountability as the fundamental issue--an issue that can only be resolved through the assessment of value-added learning. The commission's logic is as follows: (1) undergraduate educational quality is inadequate, given the challenges we face in the twenty-first century; (2) quality improvement requires a more transparent accountability; (3) assessment, especially value-added learning assessment, is fundamental to the improvement of quality and accountability. The commission's report states that

We believe that improved accountability is vital to ensuring the success of all the other reforms we propose. Colleges and universities must become more transparent about cost, price, and student success outcomes, and must willingly share this information with students and families. Student achievement, which is inextricably connected to institutional success, must be measured by institutions on a "value-added" basis that takes into account students' academic baseline when assessing their results. (U.S. Department of Education 2006, 4)

Assessment as a Force for Accountability and Excellence

The Spellings Commission got it right--quality needs to improve, accountability must become far more transparent, and assessing learning is crucial to both. This is not to say, however, that one single test must be imposed on all institutions or that we know how to measure all that is worth learning. But it is to say that transparent, systematic learning assessment can be a powerful force for improvement and that such assessment is necessary for regaining public trust in the public good served by higher education.

There is, of course, the apparent conflict between assessment for improvement and assessment for accountability. I say "apparent" because I do not think this is an either/or situation; assessment for improvement and accountability are inextricably related. The public has every right to expect that it is higher education's educational and professional duty to systematically assess its impact on student learning as an essential condition for improvement and transparent accountability.

From an improvement perspective, student learning is higher education's raison d'etre, and we know that appropriate and timely feedback to students and faculty increases student learning and informs institutional change. From an accountability perspective, rigorous, specialized professional training and the status it confers obligates the academy to be transparent in its endeavors, something expected of all professions. Moreover, colleges and universities are subsidized by the public, directly through tax revenues and/or through tax exemptions, and thus do have responsibility for rigorous student and institutional assessment and public accountability. The challenge is to make sure appropriate assessments are being used for each function and that the "stakes" attached to each are fair.

In light of the commission's recommendations, the academy is rightly worried about the imposition of federal and state mandates and the resultant loss of institutional autonomy. In terms of learning outcomes, we do not have--and it is not possible to have--one measure that does sufficient justice to the outcomes promised by colleges and universities. And certainly we know better than to defend U.S. News and World Report criteria as being worthy of anything other than our contempt as measures of quality--their variables of reputation, retention and graduation rates, and alumni giving, for example, are predicted mostly by admissions selectivity and endowment per student.

So how might the conversation about learning assessment and institutional accountability be reconciled in the name of institutional and student learning improvement without becoming politicized, as happened in the K–12 sector? The best answer from my perspective is for higher education, both institutionally and via its accreditation agencies, to take the professional lead on issues of learning assessment and public accountability.

"Going Naked"

There is a useful analog in medicine, summarized in the December 12, 2004, New Yorker article "The Bell Curve" by Atul Gawande, which centers on the treatment of cystic fibrosis. The outcomes of various treatments across the very best hospitals, Gawande notes, are distributed on a bell curve. For example, in 1997, patients at an average center lived to be just over thirty years old; similarly situated patients at the most effective center typically lived to be forty-six. Clearly this is a difference that matters! But what causes that difference? As it turns out, perceived reputation and rankings of hospitals and clinics do not predict excellence in this case. What matters is a caring and demanding institutional culture that also requires rigorous and transparent measurement of outcomes. Shared assessment data in the best clinics informs prescriptive compliance by patients and aids doctors constantly trying to improve treatment.

Making data about their outcomes public leaves centers with no alternative but to do everything possible to help patients survive. Significantly, the ability to compare results across similarly situated institutions lays bare (pun intended) the advantage of being candid and the opportunity to be challenged; there is no place to hide. And with it comes the ability to benchmark excellence and establish a culture of continuous improvement. As one doctor said, this is like "going naked."

The Collegiate Learning Assessment Project

The academy is populated with "doctors," and while we are not literally brain surgeons, the quality of life of the mind and heart is very much in our hands. Assessing outcomes to inform improvement should be just as important to colleges and universities as it is to the medical profession. Yet higher education has neither developed adequate metrics nor demonstrated a willingness to make such results public; instead, it is content to rely on, even while condemning, college guides and reputation rankings. And it is not uncommon to hear faculty and administrators across the country protest that most of what we teach is too complex and cannot be measured, that the diversity of college and university missions precludes one-sizefits-all assessment, and that the marketplace is the only required arbiter of quality. This implicit "trust us" attitude is now confronted by stakeholders who are questioning quality and no longer willing to accept higher education's sense of "faith-based" entitlement.

Seven years before the Spellings Commission, the Collegiate Learning Assessment project (CLA) began as an approach to assessing core outcomes espoused by all of higher education--critical thinking, analytical reasoning, problem solving, and writing. (Fig. 1 provides a small sample of questions used in developing our scoring rubrics.) These outcomes cannot be taught sufficiently in any one course or major but rather are the collective and cumulative result of what takes place or does not place over the four to six years of undergraduate education in and out of the classroom.

The CLA is an institutional measure of value-added rather than an assessment of an individual student or course. It has now been used by more than two-hundred institutions and over 80,000 students in cross-sectional and longitudinal studies to signal where an institution stands with regard to its own standards and to other similar institutions:

One of the most important features of the CLA program is its policy of reporting results in terms of whether an institution's students are doing better, worse or about the same as would be expected given the level of their entering competencies. . . . [It] also examines whether the improvement in average student performance between entry and graduation at a school is in line with the gains of comparable students at other colleges. The program is therefore able to inform schools about whether the progress their students are making is consistent with the gains at other institutions. Thus, the CLA program adheres to the principle that post-secondary assessment programs should focus on measuring and contributing to improvement in student learning. (Klein et al. forthcoming)

The purpose of comparison is to stimulate benchmarking and standard-setting discussions that can inform changes in institutional culture, pedagogy, and curriculum to improve student learning. And, as in the medical example above, CLA institutional comparisons result in a bell curve and bear no correlation with rankings such as those reported in U.S. News and World Report.

Does It Matter Where One Goes to College?

While the CLA's institutional comparison feature is important, measuring value-added is a necessary but not sufficient condition for improvement; defining standards of excellence must also be part of the improvement process that comparable learning assessment data afford. For example, over the past five years we have found that simply going to college makes a difference--no matter where they go to college, students do show statistically significant gains in the learning of critical thinking, analytical reasoning, problem solving, and writing. Yet virtually all colleges and universities claim that "coming here" versus going elsewhere makes a difference.

Does it matter, then, where one goes to college? In our sample of colleges and universities, we have found that twenty percent of colleges and universities provide substantially greater value-added than other similarly situated schools. We are currently looking at these one-in-five schools to begin to identify what in their cultures, curricula, and pedagogy might explain such significantly better learning gains.

Questioning the CLA

As the CLA has captured greater public attention, a number of fundamental issues have been raised. Trudy Banta, for example, has raised questions about the appropriateness of the value-added approach to learning assessment:

For nearly 50 years measurement scholars have warned against pursuing the blind alley of value added assessment. . . . Moreover, we see no virtue in attempting to compare institutions, since by design they are pursuing diverse missions and thus attracting students with different interests, abilities, levels of motivation, and career aspirations. (Banta 2007)

Steve Klein and his colleagues rebut that conclusion by pointing out that prior to the CLA, attempts at value-added assessment focused on the individual student level and did not effectively control for student entry characteristics. This problem is remedied by the CLA, which aggregates studentlevel data to the institutional level. And while Banta asserts that higher education's mission and student diversity makes valid comparisons across institutions difficult, it is precisely for this reason that the CLA assesses core outcomes transcending diverse missions and is designed to permit comparisons between similarly situated students and institutions.

George Kuh has been critical of aggregating individual student scores up to the institution level. Specifically, he suggests that when this is done, "the amount of error in student scores compounds and introduces additional error into the results, which makes meaningful interpretation difficult" (NSSE 2006, 9). Actually, measurement theory predicts just the opposite-- results should become much more rather than less reliable when results are aggregated to the school level, especially if there is reasonable variability in scores among campuses, as there is in the CLA. Our further analysis confirms this prediction.

Some believe comparing campuses is invalid because the amount of measurable value-added would be especially limited in highly selective institutions. President Amy Gutmann, for example, said that if such tests were implemented at the University of Pennsylvania, "students would do superbly when they came in, and superbly when they left, and it would be no measure of what they learned at Penn" (Lifshin 2006). Surely President Gutmann does not mean to suggest that Penn's students learn so little in four years that the value-added would be negligible. What she is suggesting, however, is that measures like the CLA cannot detect such learning gains at highly selective schools. Yet no such "ceiling effect" has been found in the CLA national data sample, which includes schools as selective as Penn.

A major concern also has been raised about the potentially brutish purposes for which the CLA or any single assessment might be used. The CLA is not meant to be used as a new ranking tool or as a tool for state or federal agencies to use when deciding how to distribute funding, and this is why CLA data are not made public. If useful learning assessment is the goal, multiple kinds of assessment are required, such as portfolios, comprehensive exams covering both general education and majors, thesis requirements (with and without oral examinations), and capstone courses, although in combination they are rarely utilized in a comprehensive, coherent, or cumulative way within any single institution.


The CLA's purpose is improvement of teaching and learning. The assessment measures core outcomes shared by all institutions and complements more local and specific assessment techniques with important comparative and value-added data. It communicates that specific higher-order learning is valued, enables institutional improvement by utilizing institutional comparisons to benchmark quality, and emphasizes that such outcomes are accomplished collectively across the entire curriculum.

Higher education has been reticent to measure and share what students are learning, although institutions using the CLA and working in consortia are more willing to take on this transparent task of comparison knowing that others are engaging in the same self-critical analysis. Improvement requires far more substantial and transparent learning assessment, a process that requires going institutionally naked.

Figure 1. Sample questions used in developing CLA scoring rubrics.

The CLA measures critical thinking, analytic reasoning, problem solving, and writing skills. These skills include the ability to evaluate and analyze source information, draw conclusions, and present an argument based upon that analysis. Below are some of the many factors that may be included in a task's scoring guide.

How well does the student

  • determine what information is or is not pertinent
  • distinguish between rational claims and emotional ones;
  • separate fact from opinion;
  • recognize the ways in which evidence might be limited or compromised;
  • spot deception and holes in the arguments of others;
  • present his/her own analysis of the data or information;
  • recognize logical flaws in arguments;
  • draw connections between discrete sources of data and information;
  • attend to contradictory, inadequate, or ambiguous information;
  • construct cogent arguments rooted in data rather than opinion;
  • select the strongest set of supporting data;
  • avoid overstated conclusions;
  • identify holes in the evidence and suggest additional information to collect;
  • recognize that a problem may have no clear answer or single solution;
  • propose other options and weigh them in the decision;
  • consider all stakeholders or affected parties in suggesting a course of action;
  • articulate the argument and the context for that argument;
  • correctly and precisely use evidence to defend the argument;
  • logically and cohesively organize the argument;
  • avoid extraneous elements in an argument's development;
  • present evidence in an order that contributes to a persuasive argument?


Banta, T. W. 2007. A warning on measuring learning outcomes. Inside Higher Education (January 26). 2007/01/26/banta.

Gawande, A. 2006. The Bell Curve. The New Yorker (December 12): 82–91.

Klein, S., R. Shavelson, R. Bolus, and R. Benjamin. Forthcoming. The Collegiate Learning Assessment: Facts and fantasies. Evaluation Review.

Lifshin, I. 2006. Gutmann: Standard tests a waste at Penn. The Daily Pennsylvanian.

National Survey of Student Engagement. 2006. Engaged learning: Fostering success of all students. Bloomington, IN: Indiana University Center for Postsecondary Research.

U.S. Department of Education. 2006. A test of leadership: Charting the future of U.S. higher education. Washington, DC: U.S. Department of Education.

Richard H. Hersh is the codirector of the Collegiate Learning Assessment project and former president of William and Hobart Colleges and Trinity College (Hartford, Connecticut).

Previous Issues