The majority of information used in health and care is confidential. Patients provide information on the expectation that the organisations who they have disclosed it to keep it confidential and only use or disclose it where that patient would reasonably expect it; in relation to delivery of their care or where there is another legal basis, such as an overriding public interest or legal obligation. Any use that identifies the individual patient, where they would not reasonably expect it to be disclosed and is not justified by an overriding public interest or legal obligation is likely to be an actionable breach of confidence. To reduce such risk when using information for secondary purposes the information must be de-identified.

‘Confidential patient information’ will also be ‘special category personal data’ under the GDPR/Data Protection Act 2018. This law has similar, but not identical, obligations on how ‘data’ should be treated as the common law duty of confidence imposes on ‘information’. De-identification is regarded as good practice under the GDPR, even if the data in some cases is still classed as ‘personal data’ (where the de-identification process can be reversed eg pseudonymisation). De-identification helps to ensure the minimum use of identifiable data (data minimisation) and improves data security.

De-identification can be either permanent (anonymised) or reversible (pseudonymised). If data is being used in pseudonymised form, then it is vital that the key to re-identify the data is strictly controlled and only accessible to organisations with a legal basis to re-identify and use the data. The general rule is that organisations commissioning services do not need to re-identify data, however those providing care may need to re-identify specific patients as a result of overall analytical activities so that the care of those individuals can be appropriately delivered.

The approach taken to de-identification will depend on what approaches and systems are in place within the local health and care community. As the care system develops the activities for which it wants to use data, there may also be a hybrid of approaches to de-identification. By using the SUDGT, the care system will be able to identify the data sources required to support the processing for activities they are looking to undertake. This is done in detail later in the tool, but prior to this the care system should ensure it is aware of the de-identification options that are available to it and the varying levels of support and effectiveness they lend to the overall secondary uses approach.


 The table below illustrates the ‘data flow from source to usage’ – noting the main options for and impacts on data degradation:

  • Where data comes from (i.e. the source provider)
  • Options for de-identification (i.e. pseudonymisation in this case)
  • Options for storage of data
  • Options for usage by new care system partners

Personal data that has undergone pseudonymisation:

It is important to note that the options for pseudonymisation vary in their effectiveness.

Pseudonymise ‘at source’: data that is pseudonymised ‘at source’, with a shared method across organisations is limited in its effectiveness. Patient A in system A, may be the same Patient in system B, but may not produce the same pseudonym if there are even minor variances in key data. The percentage of effectively pseudonymised and linked records may not be sufficient for the purposes.

NHS Digital/DSCRO: this method significantly improves the data quality and matching of individuals across datasets. However, it may have some limitations as not necessarily all data that a new care system may wish to use is available via this method. Social Care data is an example of a dataset not widely available via this route, although there is a pilot in Greater Manchester and the DHSC are currently drafting Directions to include all Social Care Data.

Local Health and Care Record: where a mature shared care record is in place and the appropriate agreements to pseudonymise and utilise data are either set up or can be established, this can be a high quality/high integration method. It may suit integration of local flows but it may not contain all the detail of the national datasets available by NHSD/DSCRO.

Data integration and de-identification models:

For each of the models below, the key IG requirements are set out, along with key considerations related to data quality and supporting architecture.


It is key to identify early on the method(s) by which your care system will be able to  pseudonymise personal data. Without this understanding, it will be difficult to map your data requirements and flows and your final solution is more likely to experience limitations and unintended data quality issues.


You should now determine and record:

  • which organisations data will be sourced from
  • where data will be pseudonymised
  • what process of pseudonymisation will be implemented
  • where personal data that has undergone pseudonymisation for integrated care will be stored

This information can be recorded in the SUDGT input tool.


  • There are at least five options for data storage, ranging from CSUs, CCGs, Local Authorities, Local Health and Care Records to non-NHS providers

  • An organisational checklist is available in the tool to support assessment of the requirements of any proposed organisation or to measure the compliance of organisations already providing similar services

  • Remember, data controller organisations are responsible for managing the risk of re-identification and must ensure they have the appropriate checks and controls in place to mitigate this risk across their own and their data processing organisations