An 18 month battle to get CEM* to release the raw marks from their 11+ tests drew to a close on Tuesday with a ruling from the Information Tribunal. These decisions are nearly always unanimous but in this instance the three person panel were split. Whilst the majority felt CEM’s commercial interests might be damaged if the information was released, “The minority strongly doubted that the exemption was engaged and even if it was that the appellant had provided sufficient material and evidence for the public interest to tilt in favour of disclosure.” The court noted that CEM had failed to send a representative to the hearing who could deal with the incomprehensibility of their explanation and answer questions from the Tribunal and that, “the appellant had provided evidence that the claimed USP of tutor-proofing was highly questionable and that the public interest warranted close examination of this claim which could only be achieved through the disclosure of the disputed material.”
The public already get to see standardised scores so where’s the problem in releasing the underlying raw marks used to calculate the standardised scores? CEM told the Information Commissioner that the unique selling point of their tests is that they are ‘tutor proof’ and that merely disclosing the raw marks would undermine this. The court didn’t require any evidence that the tests are tutor proof, merely confirmation that CEM were financially profiting from their claim.
A standardised score provides a measurement relative to the population. That word ‘population’ is important in this context. Any authority on statistics will explain that comparison should be made against the most representative data available. Given that CEM, and for that matter GL**, have test data for hundreds of thousands of students going back decades and results are referred to as standardised it is reasonable for the public to believe they have been produced following normal statistical conventions and provide a to a representative benchmark. That is, after all, what the word standard means in common usage.
11+ applicants are only compared against those applying for the same school(s) in the same year. This works if the only objective is to select the highest scoring applicants to maximise the school’s league table ranking but masks any longitudinal changes, prevents comparison between schools and means that the definition of grammar school standard is, quite literally, made up on the fly.
Take for example a school with 140 places which usually gets ~1000 applications and sets what on the face of it appears to be an objective entry standard of 114. There is a direct mapping between standard scores and percentiles so a score of 114 will always select 17.62% of the population but as population here has been redefined in a very unorthodox way to be just the current year’s cohort this approach guarantees to select 176 candidates ensuring all the places are all filled and the waiting list is manageable. The corollary is, that the much vaunted ‘grammar school standard’ is simply a product of supply and demand; if a school can stimulate more demand, attracting families from further afield, then ‘grammar school standard’ rises. Equally, if you tested a thousand monkeys using this approach you would find that 176 of them are of ‘grammar school standard’.
The Sutton Trust found that about 4% of Y7 pupils attend grammar schools nationally, rising to over 25% in fully selective authorities. Having established that grammar school standard is based on supply and demand, if we assume standalone selective schools, where demand it at its highest are only able to admit the top 9% of pupils by ability, using properly standardised scores provides the following comparison:
Providing properly standardised scores would have no impact on the admissions process, would be easy for CEM and GL to implement and if the true level of attainment needed to get a place at one school is 10 marks higher than another shouldn’t the public have a right to know this in a way which is honest, open, transparent and most importantly easy for them to understand?
There is another reason CEM might want to conceal the raw scores although this one is a bit harder to explain, even with a thousand monkeys providing metaphorical assistance. The limited information they have released shows that the applicants for one school were awarded over a thousand different scores from what must have been at most an hour’s multiple choice questions. I have my theories about how CEM appear to be able to conjure data from nowhere but the court have ruled out any public scrutiny of their methodology in preference from them being able to profit from it so instead I have to speculate about motives.
In a test of, say 50 multiple choice questions with one mark for each correct question there are 51 possible scores but in practice, given that a strategy of random guessing should result in minimum of 10 marks, the range is much smaller. Worse still, due the way the marks are distributed there are always multiple candidates tied near the pass mark. CEM’s methods make it possible for the schools to academically distinguish (sic) between candidates whose scores differ by as little as 0.01 of a standardised mark. Professor Steve Strand of Oxford University, a much respected authority on testing, described recording these results to two decimal places as, ‘wildly over precise’ which leads to my third point; why don’t we see confidence intervals in 11+ tests?
A test score is only ever going to be a rough estimate of potential and confidence intervals provide a way of expressing how good this estimate is. Any rational objective scientific research involving quantitative data provides confidence intervals so why not 11+ tests?
I asked CEM to provide confidence intervals for the 11+ tests to which elicited a terse, “information not held”. This is interesting because CEM also produce the Middle Years Information System, (MidYIS) tests which, just like the 11+, “measures students' underlying learning potential – rather than achievement” but unlike the seemingly infallible 11+, MiDYIS comes with a big disclaimer and confidence intervals. CEM explain that there is a 95% chance that, if a student who’d scored 114 in the MidYIS test were to take the test again their score would be somewhere between 105 and 123.
CEM are earning over £1m per year from these tests, GL presumably similar amounts, but they’re really just providing what their paying customers ask them to. This leaves three questions which need to be put to the grammar schools:
1) Why scores are not provided in a standardisation format which the public can readily understand.
2) Why the results are recorded to a level of precision which is wildly over precise.
3) Why confidence bands are not provided as they would be with any other quantitative data being presented in an objective way.
*Centre for Evaluation and Monitoring based in Durham university.