The Tyranny of Metrics!

Dear Commons Community,

John Wallach, a colleague at Hunter College posted the article below on a faculty LISTSERV. Written by Catholic University history professor, Jerry Z. Muller, this essay critically examines the emphasis on metrics in higher education.

Tony

===============================================

The quest to quantify everything undermines higher education!

By Jerry Z. Muller

A cultural pattern has become ubiquitous in recent decades, engulfing an ever-widening range of institutions. Now it has come for the university. Call it a meme, a discourse, a paradigm, or a fashion. I call it metric fixation. It affects the way people talk about the world, and thus how they think and how they act. The key components of metric fixation are:

the belief that it is possible and desirable to replace judgment, acquired by experience and talent, with numerical indicators based upon standardized data.
the belief that making such metrics public assures that institutions are carrying out their purposes.
the belief that the best way to motivate people is by attaching rewards and penalties to their measured performance.

These assumptions have been on the march for several decades, and their assumed truth goes marching on.

The pernicious spillover effects became clear to me during my time as chair of the history department at the Catholic University of America. Such a job has many facets: mentoring and hiring; ensuring that necessary courses get taught; maintaining relations with the university administration. Those responsibilities were in addition to my roles as a faculty member: teaching, researching, and keeping up with my field. I was quite satisfied.

Then, things began to change. Like all colleges, Catholic gets evaluated every decade by an accrediting body. For my university, that body is the Middle States Commission on Higher Education. It issued a report that included demands for more metrics on which to base future “assessment” — a buzzword in higher education that usually means more measurement of performance. Soon, I found my time increasingly devoted to answering requests for more and more statistics about the activities of the department, which diverted my time from research, teaching, and mentoring faculty members. New scales for evaluating the achievements of our graduating majors added no useful insights to our previous measuring instrument: grades.

Gathering and processing all this data required the university to hire ever more specialists. Some of their reports were useful; for example, spreadsheets that showed the average grade awarded in each course. But much of the information was of no real use, and read by no one. Yet once the culture of performance-documentation caught on, department chairs found themselves in a data arms-race. I led a required yearlong departmental self-assessment — a useful exercise, as it turned out. But before sending it up the bureaucratic chain, I was urged to add more statistical appendices — because if I didn’t, the report would look less rigorous than that of other departments.

My experience left me wondering about the forces fueling this waste of time and effort. The Middle States Commission operates with a mandate from the Department of Education. Under the leadership of Margaret Spellings, the department had convened a Commission on the Future of Higher Education, which published a report in 2006 emphasizing the need for greater accountability and the gathering of more data, and directing the regional accrediting agencies to make “performance outcomes” the core of their assessment. That mandate filtered down to the Middle States Commission, and from there, ultimately, to me.

Once the culture of performance-documentation caught on, department chairs found themselves in a data arms-race.

Metric fixation, which seems immune to evidence that it frequently doesn’t work, has elements of a cult. Studiesthat demonstrate its lack of effectiveness are either ignored or met with the claim that what is needed are more data. Metric fixation, which aspires to imitate science, resembles faith.

Not that metrics are always useless or intrinsically pernicious. They can be genuinely useful. But not everything that is important is measurable, and much that is measurable is unimportant. (Or, in the words of the familiar dictum, “Not everything that can be counted counts, and not everything that counts can be counted.”) Universities, like most organizations, have multiple purposes, and those which are measured and rewarded tend to become the focus of attention, at the expense of other essential goals. Similarly, many jobs have multiple facets, and measuring only a few of them creates incentives to neglect the rest. When universities wake up to this fact, they typically add more performance measures. That creates a cascade of data — information that becomes ever less useful — while gathering it sucks up more and more time and resources.

In the process, the nature of academic work is transformed in ways that are often harmful. Like most professionals, academics resent the imposition of goals that may conflict with their professional ethos and judgment, thus lowering morale. And they inevitably become adept at manipulating performance indicators through a variety of methods, many of which are ultimately harmful to the health of a university.

In the attempt to replace judgments of quality with standardized measurement, some rankings, government institutions, and university administrators have adopted as a standard the number of scholarly publications produced by a college’s faculty, and determined these publications using commercial databases. Here is a case where standardizing information can degrade its quality.

The first problem is that these databases are frequently unreliable: Having been designed to measure production in the natural sciences, they often provide distorted information in the humanities and social sciences. In the natural sciences and some of the behavioral sciences, new research is disseminated primarily in the form of articles in peer-reviewed journals. But that is not the case in fields such as history, in which books remain the pre-eminent form of publication, so a measurement of the number of published articles presents a distorted picture. But that is only the beginning of the problem.

When individual faculty members, or whole departments, are judged by the number of publications, whether in the form of articles or books, the incentive is to produce more publications, rather than better ones. Really important books may take many years to research and write. But if the system rewards speed and volume, the result is likely to be a decline in truly significant scholarship. That is what seems to have happened in Britain as a result of its Research Assessment Exercise: a great stream of publications that are both uninteresting and unread. Nor is the problem confined to the humanities. In the sciences as well, evaluation by measured performance favors short-term publication over long-term research capacity.

In academe, as elsewhere, that which gets measured gets gamed. Take impact factors. Once developers recognized that not all articles were of equal significance, they created techniques to measure each article’s impact. That took two forms: counting the number of times the article was cited, and considering the prestige — or impact factor — of the journal in which it was published, a factor determined in turn by the frequency with which articles in the journal are cited. (This method, mind you, cannot distinguish between the following citations: “Jerry Z. Muller’s illuminating and wide-ranging article on the tyranny of metrics effectively slaughters the sacred cows of so many organizations” and “Jerry Z. Muller’s poorly conceived screed deserves to be ignored by all managers and social scientists.” From the point of view of tabulated impact, the two statements are equivalent.)

Metric fixation, which seems immune to evidence that it frequently doesn’t work, has elements of a cult.

Moreover, in an attempt to raise their citation scores, some scholars formed informal citation circles, the members of which made a point of citing one another’s work as much as possible. Some lower-ranked journals requested that authors include additional citations to articles in the journal, in an attempt to improve its “impact factor.”

What, you might ask, is the alternative to tallying up the number of publications, the times they were cited, and the reach of the journals in which articles are published? Professional judgment. In a department, evaluation of faculty productivity can be done by the chair or by a small committee of colleagues, who, consulting with other faculty members when necessary, draw upon their knowledge of what constitutes significance. In the case of major decisions, such as tenure and promotion, scholars in the candidate’s area of expertise are called upon to provide confidential evaluations, a more elaborate form of peer review.

Citation databases may be of some use in that process, but numbers also require judgment grounded in experience to evaluate their worth. That judgment is precisely what is eliminated by too great a reliance on metrics. As Carl T. Bergstrom, a biologist at the University of Washington, puts it, “All too often, ranking systems are used as a cheap and ineffective method of assessing the productivity of individual scientists. Not only does this practice lead to inaccurate assessment, it lures scientists into pursuing high rankings first and good science second. There is a better way to evaluate the importance of a paper or the research output of an individual scholar: read it.”

Among the strongholds of metrics is the Department of Education, under a succession of presidents, Republican and Democratic. During President Obama’s second term, his Department set out to develop an elaborate “Postsecondary Institution Ratings System.” It was intended to grade all colleges, to disaggregate its data by “gender, race-ethnicity and other variables,” and eventually to tie federal funds to the ratings, which were to focus on access, affordability, and outcomes, including expected earnings upon graduation. The plan ran into opposition from colleges and Congress. In the end, the Department settled on a stripped-down version, the College Scorecard, unveiled in September 2015.

It was the product of good intentions, meant to address real problems in the provision of higher education, especially the extremely spotty record of for-profit institutions offering career-oriented education in fields like automotive repair, culinary arts, or health aids, which had been expanding by leaps and bounds. But in reaction to a genuine problem at the low end of the for-profit sector, the department responded with far-reaching demands that had consequences for all colleges.

What the advocates of greater accountability metrics overlook is how the increasing cost of college is due in part to the expanding cadres of administrators, many of whom are required to comply with government mandates. Reward for measured performance in higher education is touted by its boosters as making universities “more like a business.” But businesses have a built-in restraint on devoting too much time and money to measurement — at some point, it cuts into profits. Ironically, since universities have no such bottom line, government or accrediting agencies or the university’s administrative leadership can extend metrics endlessly. The effect is to increase costs or to divert spending from the doers to the administrators — which usually suits the latter just fine. It is hard to find a university where the ratio of administrators to professors and of administrators to students has not risen astronomically in recent decades. Metric fixation contributes to the mushrooming of administrators.

In the case of the College Scorecard, some of the suggested objectives of the original plan (the Postsecondary Institution Ratings System) were mutually exclusive, while others were simply absurd. The goal of increasing college graduation rates, for example, is at odds with increasing access, since less-advantaged students tend to be not only financially poorer but also worse prepared. The better prepared the student, the more likely she is to graduate on time. It might be possible to admit more economically and academically ill-prepared students and to ensure that more of them graduate; but only at great expense, which is at odds with another goal of the Department of Education: holding down costs.

Another metric that colleges were to supply was the average earnings of students after graduation. Not only is this information expensive to gather and highly unreliable — it is downright distortive. Many of the best students will go on to one or another form of professional education, ensuring that their earnings will be low for at least the time they remain in school. Thus a graduate who proceeds immediately to become a greeter at Walmart would show a higher score than her fellow student who goes on to medical school. But there would be numbers to show, and hence “accountability.”

Even if you leave aside the accuracy and reliability of these metrics, consider the message they convey. Initiatives like the College Scorecard treat higher education in purely economic terms: Its sole concern is return on investment, understood as the relationship between the monetary costs of college and the increase in earnings that a degree will ultimately provide. Those are, of course, legitimate considerations. College costs eat up an increasing percentage of family income or require the student to take on debt; and making a living is among the most important tasks in life.

But it is not the only task in life, and it is an impoverished conception of college that regards it purely in terms of its ability to enhance earnings. If we distinguish training, which is oriented to production and survival, from education, which is oriented to making survival meaningful, then metrics are only about the former.

The sort of lifelong satisfaction that comes from an art-history course that allows you to understand a work of art; or a music course that trains you to listen for the theme and variations of a symphony; or a literature course that heightens your appreciation of poetry; or a biology course that opens your eyes to the wonders of the human body — none of these is captured by the metrics of return on investment. Nor is the fact that college is a place where lifelong friendships are made, often including that most important of friendships, marriage. All of these benefits should be factored in when considering “return on investment”: but because they can’t be quantified, they are ignored.

The hazard of metrics so purely focused on monetary considerations is that, like so many metrics, they influence behavior. Universities at the very top of the rankings already send a huge portion of their graduates into investment banking, consulting, and high-end law firms. Those are honorable professions, but is it really in the best interests of the nation to encourage universities to direct their best and the brightest to choose those careers?

A capitalist society depends on a variety of institutions to provide a counterweight to the market and its focus on monetary gain. To prepare students for their roles as citizens, as friends, and above all to equip them for a life of intellectual richness — those are among the proper roles of college. Conveying marketable skills is a proper role as well. But to subordinate higher education to what can be quantified is to measure with a dangerously crooked yardstick.

Tony's Thoughts

CUNY education news technology

Need help with the Commons?