Ofsted 2014 Dashboard: still using the wrong data

Henry Stewart's picture
 7
Last June the Chief Inspector, Sir Michael Wilshaw, complained that too many schools "failed to challenge the brightest" and talked of a "culture of low expectations". The basis of his criticism was that many students who achieved a Level 5 in SATs at age 11 failed to get an A or A* at GCSE at age 16.

This month Ofsted has again published a "data dashboard" for every state school in England, seeking to help parents and governors understand the performance of the school. Last year I explained how the dashboard used the wrong data, in a post that is (slightly surprisingly) one of LSN's dozen most visited ever. The key fault is well illustrated in the light of Wilshaw's comments:

Whether level 5 students get B, A or A* makes no difference to any of the figures in the Ofsted data dashboard.

In the dashboard a school getting all its level 5 students (even all its 5a students) to a B would appear to be doing just as well as a school getting all its level 5 students to an A*. (The measures used in the dashboard are the % getting 5 A-Cs, with English and Maths, with no credit for any students getting above a C, and % achieving “expected progress”, with no credit for getting above a B.) If Sir Michael wants to raise expectations, he should perhaps ensure the dashboard uses data that encourages those expectations.

The Problem: “Expected Progress”



The DfE has set "expected progress" for English students of 3 levels from age 11 to age 16. This means going from a level 3 to a D, a level 4 to a C and a level 5 to a B. I actually agree with Sir Michael that schools should expect students on a 5b to get to an A at GCSE and those on a 5a should aim for an A*. However, according to the DfE measure, a student that goes from 5a at age 11 to a B at GCSE has made “expected progress”.

The figures for the % nationally making 3 levels of progress shows the variation:



The % achieving 3 levels varies according to the age 11 starting point. While 82% of students arriving with a Level 5 in Maths went on to make 3 levels of progress, just 21% of those on Level 2 and 46% of those on Level 3 made that amount of progress (2013 GCSE transition matrices). There is more variation among sub-levels: While 61% of 3a students achieve 3 levels of progress, just 21% of 3c students do so. (The reason is simple: both have to get a D to achieve 3 levels of progress and this is a much bigger jump from 3c than from 3a.)

It may be the case that we should be challenging schools to have higher expectations of their level 2 and level 3 students. However having a flat 3 levels as the “expected progress” for all students fails to provide any challenge for level 5 students. For 5b Maths students, 59% make 4 levels of progress (to an A) and for 5a Maths students, 53% make 5 levels of progress (to an A*). Clearly the number of expected levels of progress should vary with the starting point of the student.

Ofsted does not use the dashboard data in inspections



Last December I sent a tweet which suggested Ofsted used these flat measures of "expected progress" in its inspections as well as in its dashboard. HMI inspector David Brown (national lead for ICT) tweeted back that I was wrong. Ofsted's Raise Online includes tables comparing progress for students at a school, based on their KS2 starting point, and the guidelines for inspectors are very clear on this. He directed me to p35 and 36 of the School Inspection Handbook.

He is absolutely right on the guidance. For instance, for the outstanding judgement, the handbook states "From each different starting point (my emphasis), the proportions of pupils making expected progress and the proportions exceeding expected progress in English and in mathematics are high compared with national figures." (I am not sure this is observed by all inspectors. If your school has more students starting from low KS2 levels, do make sure your inspector is comparing their levels of progress with national figures for those starting points and not for the overall average.)

This is curious. For inspections, Ofsted is very clear that a school should not be judged on an overall figure for the proportion making expected progress, but on the range of progress according to their starting point. But this overall figure is, along with the % A-Cs figure, the main measure used in the dashboard.

Ofsted, please change your dashboard



The idea of providing a simple set of graphs which show the strengths and weaknesses of a school, and helps inform governors, is good. However, to be useful, it must use data based on real student progress. Fortunately Ofsted has this in abundance in its Raise Online data in the form of a value added figures, for GCSEs only and for GCSEs with equivalents.

If the data dashboard was based on the value added figure, for all students and for “disadvantaged” students, it would give a fair and clear picture of the progress being made by students in the school.

It would also prepare schools for the changes to be introduced next year. In 2015 the key measure will change dramatically with the introduction of Best8, a measure simialr to the one Ofsted displays in Raise Online. It will be based on the average progress made by all students, relative to their starting points, and no longer encourage the threshold effect of focusing all effort on getting students across the C/D borderline.

Changing from focusing on “expected progress” to value added would turn the charts in the dashboard useful measures. Ofsted, please include these next year.

 

(Also in 2014 the method of calculating the key measure, 5 GCSE A-Cs including English and Maths, changes with many GCSE equivalents (such as Btecs) no longer being included. Ofsted knows what every school would have achieved in 2013 on the new 2014 measure and have included it in Raise Online. It would have been very useful to include it in the dashboard so governors, and others, were aware of what to expect in 2014.)
Share on Twitter
File/pdf: 

Comments

rogertitcombe's picture
Mon, 21/04/2014 - 09:35

Henry - Your post highlights many logical inconsistences in the Ofsted Dashboard. This is quite a serious problem given the mounting evidence that inspectors make their minds up about inspection outcomes from such data before they set foot in the school. It appears that judgements about lesson quality and possibly even pupil behaviour are 'triangulated' to fit 'performance' information.

You advocate changing the Dashboard, but this would not address the fundamental weakness that it is all based on KS2 SATs. My research on the subject results of 'improved' secondary schools shows a common pattern for the distribution of maths and English GCSE grades. Put simply, the greater the improvement, the more the tendency for D and E grades to decline in number in direct proportion to C grades increasing.

I have not investigated KS2 but these tests are just as high stakes for the school, so it is likely that the same effect applies. This means that for secondary schools fed by 'much improved' primaries (ie schools threatened by floor targets and academisation), many lower Level 4s will really be Level 3s.

There has been much debate about this effect in KS1 on Rebecca's post. It is a universal perverse incentive arising from high stakes testing. In my view, this alone seriously undermines the validity of all Dashboard based judgements and consequently, all Ofsted school gradings. This is in addition to the logical shortcomings you point out.

However in Hackney, you have the data to check this out. The DfE Performance Tables give average GCSE grades for each of SATs L3, L4 and L5+ pupils. The tables describe these as low, average and high KS2 attainers. They are nothing of the sort in terms of the full bell curve cognitive ability of Y7 pupils. It is cognitive ability that counts, as is recognised in Hackney as the basis of the borough wide fair banding based admissions system.

The Hackney Learning Trust and all the secondary schools have the intake CATs data and the output GCSE results. You can therefore put my assertions to the test.

In terms of CATs scores the lowest third of the distribution is below 94, the middle is 94 - 106, and the top third is about 106. You can find the average GCSE grades for each group for your school and see how these relate to the SATs based figures in the Performance Tables. The Hackney Learning Trust could do this for all its schools.

This needs doing. It may be that the result is little different to that produced by the Performance Tables and Dashboard. But then maybe not. We need to know. What do you think?

rogertitcombe's picture
Mon, 21/04/2014 - 10:17

Sorry, mistake - The upper third is above 106 - not about 106.


PiqueABoo's picture
Tue, 22/04/2014 - 21:11

I recall an FFT claim on one of their blogs somewhere that KS2 SATs are better predictors of GCSE outcomes than CATs, and predictions based on SATs+CATs are more reliable than either alone. Then in an interview last year behavioural geneticist Plomin claimed a correlation of 0.5 between IQ and GCSE outcomes. Are CATs alone in Hackney really that good?

Meanwhile a year or so ago I used Excel to make some pretty graphs from the Raise Online transition data and the distribution for KS2 SATs was significantly distorted, suggesting "boosting" to both level 4c and 5c. Or some accidental (political) fiddling in the placement of thresholds.

What makes this even more fun now is KS2 SATs L6, specifically maths where I suspect we might get close to 10% passing in 2014 given that the percentages will be reported at school level. Where do they fit?

rogertitcombe's picture
Wed, 23/04/2014 - 08:31

PiqueABoo - I don't know the source of your information but the CATs correlation with GCSE results is 0.8 - 0.9. It is at the higher end for maths and science and lower for other subjects. In Cumbria in 2002 according to data provided by the LEA to all secondary schools for all GCSE subjects taken together it was 0.84. All Cumbria pupils took SATs in October of Y7. Detailed statistical data on CATs is available from GL Assessment, who market the tests.

If CATs did not perform significantly better than SATs in predicting GCSE outcomes then why would Hackney LA and large numbers of Academies and chains use them for banded admissions purposes when SATs are free and all schools have to do them? CATs are a form of general intelligence test so I think Plomin's reported claim is too modest.

The great advantage of CATs over SATs is that, like PISA, they do not test a specific knowledge based curriculum, therefore they cannot be crammed for. They also take up no curriculum time, requiring only one half day of testing for pupils.

The last word however can be left to secondary teachers, who are well aware of the unreliability of the SATs scores that are attached to individual Y7 pupils. As behaviourist cramming and revision teaching methods work for SATs (Otherwise why would junior schools take up so much of Y6 doing just that?) it is likely that by the end of the summer holidays before September of Y7 pupils forget much of what they have learned by rote for SATs.

The only way that teachers could prepare pupils for CATs or PISA would be to use methods that develop cognitive ability (ie teach their children to be more intelligent), which is exactly what they should be doing.

This is possible with the right kind of teaching and is how KS1 and KS2 should and could be reformed if CATs replaced SATs in all schools as I advocate.

PiqueABoo's picture
Wed, 23/04/2014 - 13:46

"I don’t know the source of your information"
--
The correlation was essentially an aside in the Spectator article about Plomin, written last summer in advance of the several papers that group published last year on heritability of passing SATs, GCSE's etc.

Spectator: http://www.spectator.co.uk/features/8970941/sorry-but-intelligence-reall...
Paper on heritability & GCSEs: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0080341

I found three of those papers and it might be there, but I didn't spot any specific discussion of that correlation. However they do claim that educational achievement is significantly more heritable than cognitive ability and discuss other heritable qualities that can contribute towards educational achievement e.g. motivation. Anecdotally my Y6 daughter is a high achiever but she was definitely *born* self-motivated and stubbornly determined, so I'm not sure where she'd sit in pure IQ/CATs terms.

The difficulty with any predictive system is the incentive to deliver those predictions especially when they're pinned to accountability, essentially the raison d'etre for Henry's article. This isn't my turf and I can't prove it, but those correlations instinctively feel too high and I suspect the predications may have been causing some of the results that in turn make the predictions look shiny.

I don't understand the PISA reference. Which PISA? For instance I think PISA maths competency is something we could improve by devoting more time and practise to multi-step problems.

I've grown quite cynical about the reasons for secondary views, which when expressed by 1001 teachers in public rarely tends to rise above their out-group (primary) being rubbish and responsible for all their woes. Blame-culture. They very, very rarely mention things like the built-in discontinuities between say the L5 maths curriculum on each side. Since my Y6 daughter will be taking all the L6 SATs soon I read the DfE commissioned (Sheffield) report on that and it makes me want to scream then go bang some heads together on both sides of transition, but especially secondary.

PiqueABoo's picture
Wed, 23/04/2014 - 13:56

I forgot this: the relative merits of 'KS2 data' for predicting KS4 outcomes is briefly mentioned in "An analysis of Key Stage 2 reliability and validity" found on this page: http://www.fft.org.uk/News/FFT-Research.aspx


Janet Downs's picture
Tue, 22/04/2014 - 08:33

Although "expected progress" is a better measure than raw exam results, I still have reservations.

First: children are not uniform, they develop and progress at different rates. A one-size-fits-all notion of development used for measuring is flawed.

Second: learning doesn't happen in an unbroken upward line. There are rapid rises, peaks, plateaus and troughs.

Third: it may well be true that teachers don't have high expectations of pupils who enter secondary school at levels 2 and 3. But at what stage do high expectations become unrealistic? I spent most of my teaching career* teaching lower sets. These were mostly boys, mostly bolshie. But they were articulate (which is why their oral coursework marks were usually higher than their written work). Nevertheless, very few achieved GCSE D. For some of them, getting them to stay in school until the end of Year 11 was an achievement. Getting them all to sit an exam (especially in Eng Lit) was an achievement.

Did they make progress? Possibly. The most important thing I did was read to them in the hope they would realise that books contained treasures; poetry could speak to them (and allow them to speak) and the themes in literature are universal.


*I know this is an anecdote. I have no evidence this applies to other teachers who teacher bottom sets. I may just have been incompetent.

Add new comment

Already a member? Click here to log in before you comment. Or register with us.