Government departments and agencies build up routine information about all of us as part of their everyday activities. Who should have access to these data?
Were it not for Brexit, it’s likely that the last few weeks would have seen far more discussion about this topic. On 5 July the Cabinet Office published the response to its consultation on the ‘Better Use of Data in Government’. The document’s proposals form the basis of a new Digital Economy Bill, which includes legislation to help researchers access data. The next day Dame Fiona Caldicott, the National Data Guardian for Health and Care, published her review of data security standards and proposed a new consent model for data sharing in the NHS and social care.
Researchers are interested in this so called ‘administrative data’ because its volume and detail can vastly exceed what it’s possible to collect through other routes such as surveys. As a result, bodies like the Economic and Social Research Council have set up special centres such as the Administrative Data Research Network to help facilitate and promote its use.
In this blog I want to focus on a specific use of admin data that can get forgotten. It concerns the value that emerges when we link the rich and extensive information collected about individuals through their participation in social and biomedical surveys, with the detail that comes from administrative data held about the same people.
This kind of linkage, done with the consent of the person concerned, overcomes one of the major problems with admin data; it isn’t collected with research in mind, and so will never collect information about everything of relevance to a particular research question. Of course, survey data isn’t perfect either. It rarely collects information that has the enviable detail and frequency found in an admin dataset, and also requires input and effort from survey participants. So though both types of data are important and have considerable value, the picture gained by combining them is far greater than that which emerges when each is analysed on its own.
Here are a couple of examples of research that would not be possible without linking survey and admin data. Both are from cohort studies which interview a sample of people born in the same period regularly over time, to get a clear record of how their lives are changing and why.
Setting and streaming
The first example uses survey data from the Millennium Cohort Study linked to Key Stage 1 results from the National Pupil Database (NPD). This allowed Tammy Campbell to explore the impact of streaming on primary school children. She found evidence to suggest that teachers’ perceptions seem to be influenced by streaming in a way that advantages pupils in higher groups and penalises children in lower placements. In this example, the survey data provided access to information about teachers’ assessments as well as an array of information about children – but the Key Stage 1 data from the NPD was vital in showing whether the stream a child is placed in influences their official and recorded ‘achievement’.
Participation in higher education
Administrative data collected by the Department for Education and the Higher Education Statistics Agency have been helpful in showing the relationship between young people’s backgrounds and their participation in higher education. However, these datasets do not allow researchers to explore the full range of factors that underpin these differences. To do this, Lindsey Bowes and colleagues used individual administrative records linked to survey data collected as part of the Avon Longitudinal Study of Parents and Children and the Longitudinal Survey of Young People in England. This linkage, accompanied by qualitative research, yielded important insights about the interplay between gender, ethnicity and socioeconomic status and wider social, cultural, personal and economic factors.
The main problem with linking survey and admin data is gaining access to the relevant administrative records.
As a result of the Bill, the next six months will see considerable discussion about how and when researchers should be able to use administrative records. Attention will rightly be paid to the ethical considerations that should govern who has access to this kind of personal information and under what conditions. It is vital that the research community engages with these debates and provides compelling examples of how and why we use admin data, and the public benefit this generates.
Prof Alison Park is the Director of CLOSER, a partnership of eight leading longitudinal studies, the UK Data Service and the British Library. Information about CLOSER’s work on data linkage can be found here. CLOSER is funded by the Economic and Social Research Council and the Medical Research Council.