The claims made by HMCI at the launch of the Dashboard
(Ofsted Press Release, 27.03.13) seem to me to be an attempt to oversell a product which is of dubious validity and reliability. The homepage of the Data Dashboard
states that it ‘complements the Ofsted school inspection report by providing an analysis of school performance over a three-year period …’. This is an example of Ofsted overselling the product. The Dashboard contains a representation of some school and national data for a three-year period but is it not stretching a point to call this an analysis? It really only provides any ‘analysis’ of a single year’s performance in the form of the so called ‘quintile’ comparisons.
Contrary to being a ‘powerful new … tool’ this is a crudely fashioned blunt instrument with the potential to further denigrate the work of schools and teachers. In the current febrile, high-stakes school performance climate it reduces the rich and varied complexity of schools and education to a set of simplistic statistics and spurious ‘analysis’. The embarrassing fact that the methodology had to be reviewed and the Dashboards updated within days of their release does not suggest a powerful tool that had been carefully thought through and quality assured.
How a school is performing in test and exam results is not the same as how effective it is. The School Data Dashboard is simplistic, misleading and lacks rigour whilst trying to create the impression of precision and clarity. It is an insult to both governors and parents to suggest that it is of any real value in helping them to understand and challenge their school effectively. Given that governors already have access to the far more comprehensive and sophisticated analyses in RAISEonline and both governors and parents have access to the annual performance tables this is hardly a ‘powerful … new tool’. It is however a dangerously powerful blunt instrument.
To my knowledge this is the first time that any individual school level KS1 data has been placed in the public domain. Many schools, particularly infant schools, will feel frustrated and disappointed with the nature of their debut into the national arena of data comparisons. How does comparing attainment at Level 2+ with national averages allow governors and parents to judge how effectively a school is performing? It is nothing more than a simplistic snapshot of one measure, stripped of any informative context. Without knowing the starting points of the pupils, or the context of the school it is impossible to make any judgements about how effective it is. A small ‘advantaged’ school which reported 100% of pupils at L2c would very likely be in a higher grouping (the Highest) than a large ‘disadvantaged’ school which reported 90% at L3 and 10% at L1. Which school is the more effective school?
A brief explanation of the methodology underlying the Dashboard is presented in the guidance (Ofsted School Data Dashboard Guidance, February 2013, No. 130062
). Incidentally this guidance was updated in March after the errors in methodology were corrected and the Dashboards updated but the publication date is still February 2013. Unfortunately the methodology is far from fully explained which makes it difficult to fully assess the validity of the dashboards. However, there is sufficient information and data in the public domain to question their robustness as I have attempted below, focussing on Ofsted ‘quintiles’ and ‘similar’ schools.
The use of ‘quintiles’ for both national and ‘similar’ schools comparisons is an example of how the dashboard tries to create an impression of statistical precision. The quintiles appear to make a robust discrimination between the performance of schools but I suspect it may not be appropriate to apply a valid quintile analysis to the primary attainment and progress data as the results are concentrated in very narrow bands and the distribution of appears to me to be far too skewed to make such an analysis meaningful. In robust statistical analyses quintiles are five equal divisions of a data set with the 3rd quintile being the median. It is very difficult to see how these conditions have been achieved by Ofsted.
Another example of imprecision is where the guidance refers to schools with 97% being ‘in’ the second quintile. Schools are not ‘in’ a quintile as quintiles are not intervals to score "in". Scores can be "at" a quintile, above or below a quintile or between two quintiles, but not "in" a quintile.
Within the guidance for the ‘all schools’ comparison the quintiles are explained as having been calculated by taking all of the data of interest for all of the schools and allocating 20% of schools to each quintile using the following process:
a) The data for the specific measure and the group of interest, for example the percentage of pupils achieving expected progress in Key Stage 2 English in ‘all schools’, are selected.
b) The scores for all schools are then ranked.
c) The ranks are split into five sub-groups, each group representing 20% of the ranks in the whole group.
I find it difficult to work out how ‘five sub-groups, each group representing 20%’ can have been achieved given the distribution of the data and the constraint that ‘schools with the same attainment cannot be in different quintiles’.
Using the 2 Levels Progress in English data as per the example in the guidance, 2886 of the 14291 primary schools listed in the 2012 KS2 Performance Tables achieved 100%. This is in fact the most frequently occurring score in the distribution (the mode). All 2886 schools, 20.19%, must therefore be above the top national quintile, as schools with the same attainment cannot be in different quintiles. The 2nd national ‘quintile’ therefore has to be placed at either;
-18.45%, all 2637schools that achieved between 95-99%; or
-23.22%, all 6204 schools that achieved 94-99%.
As the median is 93%, and therefore 50% of schools are in the range 93-100%, this makes determining the range of the 3rd group problematic. To be placed in the 3rd group a school should strictly speaking have scored at or above the median but a quick search of the dashboards reveals schools with 91% are in the 3rd grouping for all schools. So the Ofsted ‘quintiles’ cannot possibly be five equal sized divisions with the 3rd quintile being the median.
If my assumptions and calculations are correct these are not ‘quintiles’. The use of the term quintile is intended to give an impression of statistical rigour yet the groupings are likely to be in five groups of roughly 20% as opposed to groupings determined by quintiles. It may be that Ofsted uses data of a greater precision to determine the quintiles, e.g. percentages to one or more decimal places. If this is the case then it contradicts the Dashboard FAQs which state that ‘The figures in the School data dashboard are drawn from RAISEonline and Department for Education (DfE) performance tables’. The relevant data in these sources is to 0 decimal places.
If the underlying dashboard data only uses whole number percentages the distribution of scores for 2 Levels Progress in English must be something similar to the ranges below if the groups each contain roughly 20% of the total number of schools:
Highest - 100%
2nd group - 95% - 99%
3rd group - 91% - 94%
4th group - 86% - 90%
Lowest - 0% - 85%
The average cohort size (mode) in the KS2 Performance Tables data is 30 pupils. In an average size school each pupil therefore contributes 3.33% to the total. Thus, an average sized primary school would be in the:
- Highest group if all pupils made 2 levels progress (100%);
- 2nd group if all but one did (96.7%);
- 3rd group if all but two (93.3%);
- 4th group if all but three (90.0%) or four (86.7%); and
- Lowest group if all but five (83.3%).
If my assumptions are correct the differences in quintile performance for a large proportion of primary schools will represent the difference in performance of one, or at best two pupils. The parameters are even more extreme should you be one of the 270+ schools nationally in 2012 with only 20 pupils in the cohort. If three of your twenty pupils did not make 2 levels progress (80.0%) then you will be below the bottom quintile. It is not even mathematically possible for you to be between the 2nd and 3rd quintiles as a whole number of pupils cannot possible score between 91% and 94%. Similarly, a school with only 10 pupils in the cohort (170+ such schools in 2012) can only possibly be in the top quintile, above the 4th or below the bottom as it is not possible to score within the ranges of the 2nd and 3rd quintiles. Even with a cohort of 60 pupils (twice the national average) the difference between being placed in the Highest and the Lowest groupings is dependent on the performance of just six pupils.
Ofsted have decreed that for dashboard purposes each school has its own group of ‘similar schools’. These are defined as schools whose pupils have a similar average level of prior attainment. It is only prior attainment that is used in the model; contextual factors are not used. These are not similar schools in any sense that most people would use the term similar. These are very diverse and significantly different schools that happen to share a single characteristic at a single point in time.
Primary schools are compared with the most similar 110 schools on the basis of the overall Key Stage 1 Average Points Score (APS) of the cohort drawn from RAISEonline and Department for Education (DfE) performance tables.
However, APS data in both these sources is only calculated to 1 decimal place. As there are 400+ primary schools in the 2012 Performance Tables with the same KS1 APS of 15.9, I asked Ofsted how they narrowed down to groups of just 110 schools, given that no other criteria are used in the model. The reply was that in some cases they used APS scores up to 3 decimal places to determine what is classed as the same score. One thousandth of one APS appears to taking precision to a new level of spurious validity or even over the threshold of absurdity.
I happened to notice that the day after the Dashboard was published Ofsted updated their Subsidiary Guidance (Subsidiary guidance supporting the inspection of maintained schools and academies). Within this document there was a very helpful ‘ready reckoner’ for inspectors intended to guide them to ‘suitable ways of expressing gaps in average points scores using plain language and simple fractions.’ Thus for example 1 point is the equivalent of 1 term, one-third of a year and 4 months.
Using the equivalents in the table I have been able to calculate that by using APS to 3 decimal places Ofsted can now determine differences in attainment between primary schools to within 3 hours. To be more precise, differences of 2.880 hours (to 3 decimal places) or approximately 2 hours and 52 minutes and 52 seconds.
Interestingly, the Ofsted Subsidiary Guidance instructs school inspectors that when judging the quality of leadership and management they should consider whether governors ‘understand and take sufficient account of pupil data, or whether they are misled by ‘headlines’’. The Data Dashboard hardly makes an auspicious contribution to the avoidance of the latter.
In shaping my observations I have only used sources of information and data that are in the public domain and drawn what I feel are logical conclusions. But I am not a statistician and I may have made errors in my assumptions, interpretations and calculations. I apologise for any such mistakes.
The only way that the reliability and validity of the dashboard comparisons can be established is if Ofsted release full descriptions of their methodology and give public access to the full data they have used in the construction of the dashboards. To ensure fairness and transparency Ofsted should publish these as soon as possible, in line with the aims of the DfE Open Data Strategy (June 2012