The world of administrative data offers two contrasting pictures in contemporary debate: one enticing, the other alarming. The enticing picture is one where high-powered algorithms trawl massive data sets to reveal new and unexpected patterns; where links between different datasets can tell us more than we’ve ever known before about society; where statisticians may consider dispensing with the time-consuming business of surveying people and businesses – they can get all they need (and more) from data that are already collected mutely within digitised public and private administrative data. This is the seamless world of Big Data, open data – more data.
The other picture is more negative. It is one where information collected in and by bureaucracies can never be relied on – it’s too subject to the interests of the organisation; too contingent on the impact of internal performance targets; too limited by the administrative capacity of large organisations to organise themselves efficiently. This is the world of data scandals – about loss of data, about manipulation of data to meet targets, and so on.
For the most part these two worlds have existed separately – the one advocating ever more use of administrative data, the other sounding a defiantly sceptical note. And in these two separate world views, there’s been less discussion of a key issue underlying them both – the quality of the administrative data.
The UK Statistics Authority has just published a standard for the Quality Assurance of Administrative Data for official statistics. It sets out the principles the Authority will deploy as it forms its judgements about whether official statistics merit the National Statistics status, which indicates that the statistics are trustworthy and meet the highest standards of quality and value-added.
The Authority embarked on this exercise in the wake of its assessment of statistics on police recorded crime in England and Wales. This assessment highlighted concerns about the assurance available to statisticians on the reliability of the operational data provided by police forces. There was an accumulation of evidence pointing to possible concerns, and a lack of clear audit evidence to give confidence that these concerns were isolated or mitigated. As a result, the Authority concluded that statistics on police recorded crime did not meet the standards of trustworthiness, and therefore could not be designated as National Statistics.
And the subsequent crime audit conducted by Her Majesty’s Inspectorate of Constabulary showed that the national level of compliance was 81% – so that roughly one in every 5 crimes reported to the police by the public were not recorded correctly – and that there was a very wide distribution of performance between different forces. The HMIC audit revealed substantial concerns that were consistent with the Authority’s decision to remove National Statistics status. And at around the same time as the publication of the Authority’s Assessment Report, a National Audit Office report identified concerns with the recording of waiting list information in the NHS.
The Authority concluded that these concerns were systemic in nature, and should be addressed systemically. And we wanted to provide clarity to statisticians, some of whom told us that they were concerned that the Authority had ‘raised the bar’ without providing a clear indication of what good practice looks like. We felt that the Code of Practice should remain the touchstone, but nevertheless some analysis and interpretation would be helpful. We were also aware of the risk of a chilling effect -that statisticians in government departments and agencies would become inhibited about using administrative data because of the fear of subsequent criticism from the Authority of data integrity.
So to address this systemic issue we embarked on a major exercise to review current practice and set out how we will judge the quality of official statistics based on administrative data in future. We ran a comprehensive process, with widespread discussions with Government and other statisticians around the UK; we issued our initial findings as an exposure draft; and we held a range of subsequent feedback sessions with producers of statistics. We were impressed by what we found: many statisticians are already thinking hard about the quality of administrative data, and are adopting a range of practices to address the risks in the data. But we were also struck by the taboo nature of the subject – there seemed to be no common framework for thinking about or acknowledging these risks, and little sharing of the good practice that we identified, which as a result tended to exist in isolated islands or archipelagos of good practice.
Our standard – what we will look for when we decide whether statistics merit National Statistics – is clear and simple. We will look for statisticians to:
- Provisionally determine their view of the appropriate level of assurance required for a set of statistics.
- an explanation of their judgments of the required level of assurance for a set of statistics;
- evidence to support the rationale for these decisions;
- evidence of the actions, and the rationale for deciding upon those actions, that the producer has taken to achieve compliance with the chosen level of assurance; and
- evidence which demonstrates that they have embedded practices for keeping its quality assurance arrangements under review.
The defining features of this standard are judgement, proportionality and curiosity. The Authority is encouraging statisticians to deploy their judgement – to decide how risky the administrative data they receive are, to seek assurance that the risks have not materialised and to provide evidence to users about any inherent weaknesses in the data. The approach is inherently proportionate, too – it does not expect high levels of scrutiny and audit for most data sets but only the most risky: those subject to a wide range of operational influences like police recording of crime. And running through it all is curiosity and scepticism – that statisticians should not accept the data they receive at face value.
We also recognise that to expect statisticians to demonstrate scepticism and curiosity – to challenge the data they receive – requires them in some situations to ask difficult questions of their data suppliers. This in turn may require greater understanding on the part of data suppliers themselves of the use to which their operational data will be put and the value of the official statistics they inform. To help foster this understanding, the Authority will supplement its standard with case studies of good practice and a guide for policymakers and operational managers which will have a simple message: care about your data!
Taken together we are confident that this standard and accompanying materials will help statisticians and Government continue to produce statistics that the public can trust. And we believe it can also bring the two pictures – of mistrust of official data and utopian hopes in the opportunities from Big Data – into much closer alignment. If we help foster this result, the Authority will have fulfilled its core purpose: to ensure that official statistics serve the public good.