Peer Review

Moving Beyond a National Habit in the Call for Accountability

During and after the heated exchanges that led to the 2008 Higher Education Act, standardized tests were, not surprisingly, proposed as the “first response” to document college and university accountability for student learning. Policy makers and decision makers were especially responsive to this effort, believing they then could efficiently compare institutions and fill out the report card for state-by-state educational performance based on a sample of students’ performance on national instruments. Efficient discussions about student scores could, in turn, spawn state and federal policies. With some exceptions, tests—aptitude, achievement, placement, certification, entry-level, for example—have dotted our educational and professional lives from the moment we entered formal education. Let’s face it: testing is a national, even international, habit—a first-response solution to questions about student achievement—because it enables quick and efficient judgments, including policy makers’ decisions.

Demonstrating Student Learning through Standardized Tests

Some institutions immediately chose to conform to the first-response solution—maybe even to remove the accountability issue as a central institutional concern—using existing standardized instruments to demonstrate their students’ general education learning, i.e. the Collegiate Assessment of Academic Proficiency (CAAP), the Measure of Academic Proficiency and Progress (MAPP), or the Collegiate Learning Assessment (CLA), a newer standardized test that takes a performance-based approach in its critical and analytical thinking and writing instruments. (Though the developer of the CLA did not originally develop its tests to be used as national instruments to compare institutions’ performance, but as instruments an institution could use along with other sources of student learning evidence to take stock of its students’ learning, nonetheless, the CLA sought to become the national gold standard to compare institutions’ educational quality.)

Among higher education organizations, the Association of Public Land-Grant Universities (APLU), formerly the National Association of State Universities and Land-Grant Colleges (NASULGC), and the American Association of State Colleges and Universities (AASCU) proposed that its member institutions voluntarily post scores from students’ performance on either the MAPP, CAAP, or CLA under its Voluntary System of Accountability (VSA). These scores are included in VSA’s College Portrait of Undergraduate Education Web site as one of the informational components about its institutions. These components are presented in a way that assists parents and prospective students to learn more about institutions they are considering (www.collegeportraits.org/guide.aspx). Although the site offers institutions the option to post other student achievement information, the College Portrait site is designed for comparisons among its schools. Thus, for example, data that lend themselves to numerical representation are presented— retention rates, graduation rates, results of student surveys, and now standardized tests scores for students’ general education learning as a primary indicator of educational effectiveness. Measured at the institution level only, average test scores of seniors (for example, CLA recommends sampling one hundred freshmen and one hundred seniors for four-year institutions) are compared with those of freshmen to represent students’ learning gains at an institution. A score of sample students’ achievement is becoming, then, a way to brand institutions’ educational effectiveness as well as a way to compare institutions. In defense of the current use of standardized tests among institutions participating in the VSA, David E. Shulenburger, vice president for academic affairs at APLU, argues that use of an “outcomes test creates a rough comparability among participating universities and enhances public accountability” (Lederman 2009). “Rough comparability,” based solely on a sampling of students, unfortunately, leads to public constituencies’ crystallized judgments and decisions about the overall educational quality of our colleges and universities. Informed decisions and judgments come from more than one source of evidence.

To date, approximately three hundred of the five hundred member public institutions have joined the VSA or have been mandated to join. Others have not made up their minds; still others are resisting. These institutional decisions are important. If campuses rely solely on the scores from standardized tests focused on only two or three outcomes taken by a sample of students, which then become the national standard upon which students, parents, legislatures, policy makers and other stakeholders make judgments and final decisions about the educational effectiveness of our colleges and universities, that straitjacket will lead to misrepresentations, oversimplifications, and overgeneralizations about our institutions’ educational effectiveness, aims, and expectations. In turn, state and federal policy makers and legislatures will deliberate and act based on the limitations of a single source of evidence—again based on a sample of students taking a ninety-minute test.

Standardized Testing Limitations

What are some of those major misrepresentations, oversimplifications, and overgeneralizations? For starters, how well a sample of students taking these tests truly represents the diverse learners in an institution is questionable. There really is not a “typical” learner at most of our institutions; there are various kinds of learners who enter with different life experiences, learning histories, cognitive levels, levels of understanding and misunderstanding and misconceptions, and motivation. At no time in the history of higher education have we seen such diversity in our student population: the demographics of our community colleges and four-year institutions include, for example, developmental, transfer, first-generation, international, full-time, full-time and working, part-time and working, international, non-native-English speaking, distance, nontraditional-aged, traditional-aged, learning disabled, and honors students—to name a few. Realistically, there are different levels of student performance across an institution based on students’ readiness, motivation, abilities, levels of understanding and misunderstanding.

All students do not progress at the same rate, learn the same things at the same time, or represent their learning the same way; yet, standardized tests demand that test takers perform within an instrument’s closed universe of “measurable responses.” How can these test results capture the levels of achievement and progress of our diverse learners, given that each student has a different entry point of learning, different learning history, motivation, readiness, and sets of abilities? Where are the accompanying data about our diverse learners? How and when did they enter our institutions? How well did they progress or not? Why; how well did they achieve by the time they graduated given where they began their journey? Unlike Olympic competitors who have gone through numerous elimination trials to reach the starting line for the final challenge of their careers, students at the higher education starting line do not all have comparable abilities, educational histories, expectations or motivations. Thus, generalizations about “students’ performance” based on a small sample of students misrepresent our diverse learners’ range of achievement and levels of progress, leading to overgeneralizations about the educational quality or effectiveness of our institutions.

Further, how institutions manage to round up students to take an externally designed test, besides mandating the test in a course or at a designated time, leads to questions about just how completely representative those test takers are and about how seriously the final set of test takers is committed to performing on an instrument that is not personally relevant to them. If institutions use different strategies for recruiting their student sample, then how well do those students’ performances represent a comparable set of students across our institutions? If an institution presents a high achievement score the first time around but the score declines after a second round of tests years later, does that score mean that an institution’s educational quality has declined even though the same educational practices that accounted for the first round are essentially unchanged? Do variations in scores over cycles represent what a particular sample of students happened to perform on a given day but not necessarily represent the range of achievement levels at an institution? Over-simplifications will abound.

Developing the practice of using standardized tests results based on a sample of students to reach quick conclusions about educational effectiveness, as could easily occur in the world of federal and state policy makers, also misrepresents the aims, expectations, and educational processes and practices at our colleges and universities. Reducing learning to what can be easily measured or easily simplified to a metric diverts external audiences’ focus away from the authentic work that students produce along the continuum of their learning as well as at the end of their careers. Capstone projects, research projects and papers, team work, lab reports, proposals, creative products in the visual and performing arts, practica, internships, and community service projects, for example, demonstrate not only how students apply their learning, but also how they integrate knowledge, abilities, habits of mind, ways of thinking and problem solving, ways of behaving or responding. Presented with results that are “easy” to measure, public constituencies lose sight of these more complex representations of learning that include or integrate the general education outcomes that current tests measure in isolation, such as critical thinking, writing, quantitative reasoning, and ethical reasoning. Deconstructing learning into skill sets does not realistically represent how students actually think, act, solve problems, engage questions, take risks, propose new ways of looking at a problem, create an original work, or design research. That is, students draw from a repertoire of strategies and ways of knowing represented in their work.

Realistic Evidence of Student Learning

We need more robust and realistic evidence of our diverse students’ learning achievements and levels of achievement than standardized instruments can provide. Can colleges and universities publicly account for their students’ learning in a way that represents their diverse learners and their level of achievement while respecting their various missions, student demographics, educational practices, and philosophies? Is there an alternative to the national testing habit that can also humanize student achievement within our institutions’ contexts?

Advancing a new model that can respond to accountability demands and yet recognize our diverse college and university missions and purposes, their students, and educators’ practices is the Association of American Colleges and Universities (AAC&U). Building on institutions’ growing commitment to assess students’ learning, required by national, regional, and specialized accreditors, AAC&U’s Valid Assessment of Learning in Undergraduate Education (VALUE) project is now changing the national conversation about how institutions can present their students’ achievement levels (www.aacu.org/value). Part of a FIPSE grant awarded to AAC&U, APLA, and AASCU, this national project provides external audiences with another lens through which they can gain a realistic and robust representation of student achievement. This national project builds on the ongoing work across our campuses: using collaboratively designed criteria and standards of judgment—scoring rubrics—to assess student work. By analyzing the results of scoring rubrics applied to students’ work, educators can track how well students are achieving or do achieve program or institution-level expectations for learning in general education and in students’ major programs of study. Results are collaboratively discussed in terms of students’ patterns of strength and weakness. Patterns of weakness in student achievement become the basis for determining ways to improve performance levels through changing pedagogy, curricular design, or educational practices or policies. AAC&U has now taken this assessment practice to a national level. Faculty teams at twelve leadership and fifty partner campuses across the United States, ranging from community colleges to four-year institutions, have developed fourteen national scoring rubrics for general education outcomes identified by AAC&U as the “essential learning outcomes” of contemporary undergraduate liberal education, described in its publication College Learning for the New Global Century. These essential learning outcomes and a set of “Principles of Excellence” provide a new framework to guide students’ cumulative progress through college. Other organizations, e.g. the Partnership for 21st Century Skills, found the need for a much broader set of outcomes than existing tests currently measure (Partnership for 21st Century Skills). Within the AAC&U framework students should be able to demonstrate in increasingly complex ways the following fourteen essential learning outcomes within the work they produce:

  • inquiry and analysis
  • critical thinking
  • creative thinking
  • written communication
  • oral communication
  • quantitative literacy
  • information literacy
  • teamwork
  • problem solving
  • civic knowledge and engagement—local and global
  • intercultural knowledge and competence
  • ethical reasoning and action
  • foundations and skills for lifelong learning
  • integrative learning

Assessment through Rubrics

Each of these outcomes is further broken down into criteria descriptors that list the attributes, qualities, or abilities students are required to demonstrate in work that focuses on this outcome or incorporates this outcome. Students’ demonstration of these qualities is scored against descriptive performance levels. In contrast to norm- based tests, the results of which are used to compare or rank students (as would be the ultimate aim of using standardized tests to compare institutions’ educational quality), the results of scoring rubrics enable both a faculty member or other educator at the institution to view students’ performance against criteria revealing students’ patterns of strength and weaknesses. Scoring rubrics represent the dimensions of learning characteristic of a learning outcome such as writing, problem solving, and critical thinking. And they provide educators with evidence of how well students execute the components of an outcome. For example, in the current draft of criteria for critical thinking, students need to demonstrate in their work how well they (1) explain an issue or problem; (2) investigate evidence; (3) qualify the influence of context and assumptions in an issue or problem; (4) present their own perspectives, hypothesis, or position on an issue or problem; and (5) present their conclusions, as well as the implications and consequences of their conclusions.

In addition, students are representing their achievement levels not only in general education or core courses, but also at higher levels of performance in their major programs of study in culminating work such as capstone or senior products, research projects, and other representative professional contexts.

The first version of these fourteen national scoring rubrics is currently being piloted across twelve lead institutions, as well as other institutions that are joining the project. The first pilot results will lead to a second draft of national rubrics; a second draft will undergo the same pilot process leading to a third and final draft of these fourteen rubrics that will be nationally available for colleges and institutions to apply to their students’ work. Representing accountability for student achievement through the VALUE alternative recognizes current institutional efforts to identify patterns of strength and weakness in student work against nationally agreed-upon criteria and standards of judgment. This project also respects the diverse ways in which students represent or demonstrate their learning along the continuum of their studies leading to graduation. Lower than expected patterns of performance promote dialogue across a campus to identify ways to improve student achievement, leading to advances in pedagogy, curricular design, and educational practices and policies.

VALUE represents a humanizing alternative to the national habit of tests, demonstrating the ways in which students represent their learning through their work and through an open universe that permits diverse ways of representing learning and the levels at which diverse learners achieve. Therein lies the essential difference between the national habit of standardized instruments and VALUE: representation of the dimensions of students’ general education learning within the context of educational practices and the work that students produce. Learning, after all, is not simply a process of pouring information into individuals. It is a process through which students construct their own meaning. Learners learn differently, use different strategies, and represent their learning in different ways.

The VALUE project alternative is not consonant with the way most decision makers and policy makers think, know, and act. In fact, it challenges them to change the evidence they are most comfortable using and change the ways in which they view that evidence. For example, those pressing for national tests, such as Charles Miller, former head of the Spellings Commission, who view this alternative as flawed (Lederman 2009), argue that we rely on standardized tests in our educational system; therefore, higher education should continue to use them to represent our own educational results. Perhaps many of those individuals have not followed an emerging pattern across the United States: currently approximately 775 colleges and universities, including highly selective ones, no longer require the SAT or ACT to make admission judgments about their applicants (www. FairTest.org). That is, they have come to the conclusion that tests are an incomplete way of representing individuals and predicting their success. Perhaps, as well, those who wish to establish national testing as a means to make decisions about institutional quality may not be spending time on our campuses. If so, they would readily see that tests are no longer our sole means of evaluating students. A wide range of assessment methods are used across our campuses, such as virtual simulations, case studies, wikis, online journals, lab reports, and internships, to name just a few. These methods become the basis of grading. In addition, as VALUE recommends, e-portfolios will become the means for students across the country to store and build on their work. E-portfolios will also then contain work that can be systematically assessed using agreed-upon scoring rubrics.

Charles Miller also recently argued that the AAC&U project, though praiseworthy, does not provide the “quantitative and comparable” evidence of student learning that would serve the “public purposes” he sees (Lederman 2009). One of those “public purposes” is preparing students to enter the workforce. Results of a 2008 employer survey commissioned by AAC&U revealed that employers prefer evidence of student achievement based on the authentic work they produce, as opposed to standardized test scores. Specifically, the results of a 2008 survey of employers concluded:

When it comes to the assessment practices that employers trust to indicate a graduate’s level of knowledge and potential to succeed in the job world employers dismiss tests of general content knowledge in favor of assessments of real-world and applied-learning approaches. Multiple-choice tests specifically are seen as ineffective. On the other hand, assessments that employers hold in high regard include evaluations of supervised internships, community-based projects, and comprehensive senior projects.

Employers’ emphasis on integrative, applied learning is reflected in their recommendations to colleges and universities about how to assess student learning in college. Again, multiple choice testing ranks lowest among the options presented, just below an institutional score that shows how a college compares to other colleges in advancing critical thinking skills. Faculty evaluated internships and community-learning experiences emerge on top. Employers also endorse individual student essay tests, electronic portfolios of student work, and comprehensive senior projects as valuable tools both for students to enhance their knowledge and develop important real-world skills, as well as for employers to evaluate graduates’ readiness for the workplace (Peter D. Hart Research Associates 2008).

Moving from a Testing Model to the VALUE Model

If we shift our focus on accountability from standardized test scores to performance against national scoring rubrics applied to student-generated work, we can open national dialogue about “what counts” for evidence of our students’ achievement. We can provide evidence of levels of achievement in writing, creativity, and problem solving, for example, across the different ways in which students represent those outcomes—from written work to visual work to virtual work. With an agreed-upon set of essential outcomes for general education in higher education, accompanied with nationally shared and agreed upon criteria and standards of judgment, we can work together with our public constituencies to identify ways to present our results within the context of our institutions, their missions, and their learners.

Moving from a testing model to the alternative VALUE model may, in fact, lead higher education and our public constituencies into new modes of inquiry about student learning and new ways to make judgments and decisions about educational quality across the country. Consider two possible scenarios that could emerge from the VALUE project:

  • Creation of a coalition of representatives from business, government, industry, accreditation, students, parents, educators from P-20 and two-year and four-year institutions charged with designing a way to represent our students’ general education learning based on the VALUE scoring rubrics. To this end, consideration should be given to the potential of the current commercial assessment software systems that institutions are already using. These systems already store accrediting standards and now could also store the VALUE rubrics. Questions about the objectivity of internal scoring could be addressed by uploading student work for external reviewers to score. In addition, these software packages are able to represent assessment results at many different levels: course, program, institution. And, they can aggregate and disaggregate results for various audiences and purposes.
  • Creation of regional and national learning communities that share results of the application of VALUE scoring rubrics in the national interest of learning about pedagogy, educational practices and policies, and curricular design that foster high-level student achievement in general education outcomes. The aim of these learning communities would be to position institutions to learn from each other about practices that foster high achievement among diverse learners as well as disseminate that knowledge.

AAC&U has worked to set the national agenda for a new conversation about educational quality among our institutions and our public constituencies. VALUE represents a way to view accountability that realistically represents the strengths and weaknesses that educators see in their own students. Higher education is not refusing to provide evidence; it wants to present it within a context that prevents misunderstanding, misrepresentation, and oversimplification. It now remains for external constituencies to join a collaborative conversation about ways in which higher education can realistically represent its students’ achievement.

References

Association of American Colleges and Universities. 2007. College learning for the new global century: A report from the national leadership council for Liberal Education and America’s Promise. Washington, DC: Association of American Colleges and Universities.

www.Fairtest.org

Lederman, D. 2009. A call for assessment—of the right kind. Inside Higher Ed. January 8. www.insidehighered.com/news/2009/01/08/aacu.

Partnership for 21st Century Skills. Beyond the 3 Rs: Voter attitudes toward 21st century skills. 2007. www.21stcenturyskills.org/documents/p.21_pollreport_2pg.pdf.

Peter D. Hart Research Associates. 2008. How should colleges assess and improve student learning? Employers’ views on the accountability challenge. January 9. Washington, DC: Peter D. Hart Research Associates.


Peggy L. Maki is an education consultant and assessment and accreditation specialist.

Previous Issues