Of all the various Great Ideas launched at education in the past couple of decades, none have done more damage than the Big Standardized Test, a practice that has been in place now for a generation. So on top of the other harms done by test-driven accountability, the cherry on top is that a whole crop of newbie teachers has emerged thinking that test-centric schooling is natural and normal and how the U.S. education system has always worked. Meanwhile, we are just about to enter the season in which school staffs start creating cutesy videos and holding noisy pep rallies in an attempt to convince these tests are Important and students should Do Their Best. Yuck.
The BS Tests have been a source of toxic waste in schools for years and years, and they have created this toxic effect in three distinct ways.
High stakes for a narrow measure
A single test is used as a broad measure of educational achievement. It claims to measures reading and math and nothing else, and yet it is repeatedly used as a measure of educational quality, students achievement, and teacher/school effectiveness. States have used BS Test results to label schools as "failing" which can have consequences ranging from a loss of funding to charterization to plain old reputational damage.
Attaching high stakes to the test has led to a twisting and warping of curriculum, with course content and even courses themselves judged by just one metric-- is it on the Test? Science, history, the arts, even recess cut from schools so that extra work can be put into getting studennts to raise those scores, because the BS Test turns schools upside down. The school doesn't exist to serve students by giving them an education; students exist to serve the school by generating test scores. The upside down school effect is particularly notable in manuy charter schools, where the scores are an important marketing tool and so students who don't help make good numbers have to be "counseled out."
Meanwhile, test scores make an easy reference point for journalists, especially when combined with such prestidigidatation as "days/months/years of learning" which is just a fun mask to slap on the increase or decrease in test scores. Or soaking test scores in VAM sauce to make them seem as if they Really Mean Something. Or the transformation of scores into a kind of stock market, rising and falling as if they are waves of data flowing through a single medium, rather than representing the scores of different students.
But, hey. If the scores represent real measures of reading and math skills, isn't all of this justfied? Isn't it?
Lousy tests
Have the Big Standardized Tests been checked for validity and reliability? Do they measure what they purport to measure? Will they produce consistent results (iow, if the same student takes the test multiple times, will he get pretty much the same score every time)?
The most likely answer is "Nobody knows for sure, but probably not."
Multiple choice questions are about the weakest measure of knowledge and skill we have. But written answers create an assessment challenge that is almost insurmountable at that scale (and certainly insurmountable by any bots currently available). Also, a test needs to be created for a particular purpose, while the BS Tests are sold as being useful for multiple purposes. "We will sell you," say testing companies, "a piece of string that can be used to measure the circumference of a cloud and the amount of water in a swimming pool."
If we start with the number of skills that the BS Test claims to measure and multiply it by the number of items that it would probably take to measure those skills, we arrive at a test much larger than the actual tests.
All of this gives us ample reason to suspect that the BS Tests are less-than-awesome assessment tools, suspicions that might be quelled by extensive test testing to show validity and reliability. Except that there doesn't seem to be any such test testing out there. Meanwhile, folks keep arguing that if teachers just teach the standards, the test results will take care of themselves, despite the fact that test results vary wildly from year to year for the same teacher.
But, hey. It generates some data, and even that sketchy data should be useful for something. Shouldn't it?
Tortured data
When a classroom teacher uses an assessment to evaluate learning and instruction, she can dig down to a granular level. Go question by question, checking student responses against the test items to see exactly where students are going wrong (or right).
But the BS Tests are black boxes. Policy makers have accepted the notion that a test manufacturer's proprietary material is more important than useful data for schools, so teachers are forbidden to so much as look at the questions on the test, and the results that come back to schools (in too many cases, still after too many months) are rough summaries. For years, my results for student on the BS Test were broken down into "reading fiction" and "reading nonfiction," and that was it.
Imagine you are a parent whose child brought home a C on a major reading test, and the teacher wouldn't let you see the test and wouldn't tell you what areas your child needed help with and what areas were your child's strength. In response to the question, "What can we do to help him," the teacher replied, "Just, you know, work on his reading." That is where teachers are with BS Test results.
This tiny sliver of data is one of the reasons that schools take to carpet-bombing students with a host of broad, unfocused "interventions." It's also why we've seen the booming cottage industry of pre-test testing, with schools giving multiple tests throughout the year in an attempt to identify students who can be dragged to a higher score and to identify the areas in which interventions for these students might help. The actual BS Test doesn't give us the information we need, so maybe a few rounds of NWEA MAP testing will tell us what the BS Test won't (spoiler alert: it won't, in part because it's hard to predict how students will do on a test that isn't very reliable or valid).
So very little useful data gets back to teachers and schools. It is almost as if policy makers are only interested in generating pass-fail labels for schools and not in providing data that would actually help improve performance.
Solutions?
Policy makers could fix any one of these three factors. They could reduce the stakes attached to the BS Test, or combine test results with other measures of education. They could simply require the tests to be better, and they could certainly require test manufacturers to provide more useful data in a more timely fashion. In fact, in some states, policy makers have taken some baby steps. But it's not nearly enough.
Underneath all of this, there are philosophical questions to be answered, like how does one distinguish between good schools and bad, can you measure the difference, and if you can, is there any benefit to trying to slap "good" and "bad" labels on schools or teachers. But I don't recommend holding your breath while waiting for policy makers to have serious philosophical conversations about education in this country.
But in the meantime, high-stakes large-scale standardized testing continues to be one of the single most destructive factors in U.S. education. If you handed me a magic wand, it is the very first thing I would disappear. Barring that, it would be great if we could just do better.







