More Problems with Progress 8
My article, ‘Why maketisation invalidates Progress 8‘, addresses the concern that the production of high stakes performance measures for schools is susceptible to gaming or even something more like cheating or worse.
In October 2010 Perry Beeches school, an 11-16 Local Authority controlled community comprehensive in Birmingham, was widely featured in the national media as the ‘most improved school in the UK – Ever‘. By 2016, Perry Beeches School, now part of a Multi Academy Trust, was in the news for very different reasons.
During 2010 and 2011 I carried out a detailed study on the curriculum and the approaches to teaching and learning that lay behind the ‘Most Improved School Ever‘ claim. This resulted in an article in the education journal ‘Forum’.
In the article I wrote as follows.
The stakes for schools are very high indeed so no one can blame heads and governors for opting for a formula that produces success in the system that schools are forced to be part of. This fact would have been especially pressing in the recent past at Perry Beeches where the school’s former attainment of only 21% good GCSEs including English and maths led to changes brought about by the new head. Vital issues, however, relate not just to league table status but to progression to high-quality vocational education and training and to access to university for children attending comprehensive schools threatened by the ‘failure’ label and by an Ofsted system driven by the same narrow focus on floor targets (raw results levels below which are deemed by the Government to be unacceptably poor).
The article raised questions back then, about whether the changes to the curriculum and approaches to teaching and learning brought about by the need to succeed in the market are in the best interests of students and the education system. ‘Progress 8’ is an attempt to ‘nudge the market’ in a direction designed to address such concerns. My conclusion five years on is that unforeseen, perverse outcomes inevitably emerge as described in Part 3 of my book. ‘Learning Matters‘.
In this article I will raise doubts as to whether the Progress 8 measure is fit for purpose even within its own terms of reference. It is produced by the aggregation of two high stakes sets of national ‘test results’, five years apart.
On its past record, by the time this year’s SATs results are processed alongside the GCSE results of secondary schools in five years time, the government is likely to have substantially changed once again the entire basis of the KS2 National Curriculum and the SATs tests used to measure attainment on it.
The issues he raised do not appear to have been addressed. I quote from his article as follows.
It’s all so convoluted; so removed from what learning looks like, turning ‘Progress’ into some kind of absolute metric.
To begin with, there is an input measure – a fine sublevel – that is derived from the raw scores on two tests in different subjects. If you read my posts The Data Delusion or The Assessment Uncertainty Principle, you will see how far we move away from understanding learning even with raw marks. However, it appears that raw marks in different subjects are to be put through a convoluted mincing machine where 74 and 77 become 5.1. One number representing EVERYTHING a student has learned at KS2. On average.
But then we get to the crux. Despite all the four sig fig nonsense, we actually end up with an outcome, in the worked example, where Progress 8 is 0.3 +/- 0.2. In other words; 95% certain to fall somewhere between 0.1 and 0.5. (Coincidentally, these are the same numbers for my school.). What we end up with is a super-crude 1 significant figure number falling somewhere within a range that is bigger than the number itself. Essentially, the whole palaver divides the Progress measure into three categories: Significantly above; average; significantly below. That’s it. The numbers actually don’t tell us anything significant at all.
I suppose that, as long as we recognise this, we’ll be OK. However, I worry that people will not really understand much about this and they will assume that scores of 0.5, 0.4 and 0.3 are really different; people will assume that schools will have performed better than others even though, within the limits of confidence, that assumption doesn’t hold up. If the error bars overlap – essentially we have to assume that the data doesn’t tell us enough to tell the schools apart. Similarly, if one school ‘improves’ from Prog 8 0.1 to 0.2 from one year to the next, actually they’re kidding themselves. The error bars will overlap to the point that there’s actually a chance they did worse.
Will people listen? Of course not. We’ll get league tables of Progress 8 measures ranking schools; Governors and prospective parents across the land will be fretting about the school next door having a higher score – all based on the most convoluted algorithm founded on the data validity equivalent of thin air; a number that says nothing of substance about how much learning has taken place over the course of five years. Nothing.
Is the progress ‘boost’ due to the pupils themselves or due to something the school has done? The honest answer is that we can’t tell from the data we have available. And because we can’t resolve this conundrum it means that Progress 8 is not a measure of school effectiveness.”
Take note people; take note. It is NOT a measure of school effectiveness. [my bold]
Tom Sherrington’s article produced a large number of comments. Here is an example.
Schools are already playing the Progress 8 game. As 3 of the 8 slots are reserved for EBacc subjects SLTs have recognised that SEN students who may not ordinarily opt for EBacc may adversely affect their Progress8 score. Therefore these students have been encouraged to take EBacc subjects. A school near me has the whole of Year 10 taking Triple Science. It might not be in the interest of the students but it makes the data look good!
POSTED BY MG | MAY 4, 2015, 3:43 PM
It would be hard to find a clearer example of a perverse outcome driven by the pressure of marketisation.
Over a decade ago I came up with a better approach to judging the effectiveness of secondary schools. In my scheme the input data are Cognitive Ability Test (CATs) not SATs scores. These have two advantages.
- They are IQ type tests not based on the study and recall of any syllabus or body of knowledge. They are therefore not susceptible to the high pressure pre-test, past paper based rote learning and cramming that so degrades the Y6 experience of our pupils and their teachers.
- It usefully challenges the assumption that because SATs results correlate well with GCSE performance, then value added is best judged on the basis of prior attainment as measured by SATs. The alternative, correct view is that all academic attainment at all ages is actually mediated by general intelligence (cognitive ability), which is reliably currently measured by the CATs taken by thousands of pupils in Y6 for the purpose of ‘Fair Banding’ admissions systems. It is because cognitive ability drives both SATs and GCSE results that this fallacy has resulted and has come to dominate the English education system.
This flawed assumption distracts from the fact and educational potential of plastic intelligence, which if accepted, changes the whole purpose of education at all Key Stages from ‘attainment’ measured by SATs and GCSEs for the main purpose of producing school performance indicators to drive an artificially imposed market choice by parents, to developing to the full the cognitive and other abilities of all students.
My school effectiveness approach requires the production of scatter diagrams as shown below. This is in fact for Cumbria schools where in the 1990s they were produced by the LEA every year, but largely ignored. But that is another story.
As Tom Sherrington points out, it is just not possible to produce any simple numerical parameter that captures school effectiveness. My regression diagram enables the identification of what look like the most effective schools as those above the regression line. It would be difficult to translate such diagrams into a performance measure for driving league tables. That is another advantage. Apparent school effectiveness (or not) should be the starting point for school inspection and evaluation not the automatically produced conclusion.
It is interesting to note that the school (not its real name) with the worst GCSE performance (below the floor target) and by far the lowest intake ability, looks to be more effective (above the regression line) than the highest performing school (on the regression line).
Using single numbers to drive parental choice and OfSTED judgements of schools is massive folly.