OpenSAFELY Codelist Audit

How to interpret this report

This report summarises automated checks applied to the codelists used in this study. The checks are intended to highlight potential issues that may require review, but they do not assess clinical appropriateness or study validity.

An ERROR indicates a codelist that is very unlikely to behave as intended (for example, matching no events at all in the data) and should be reviewed before results are published.

WARNINGs highlight situations where a codelist may be out of date or incomplete. These are not necessarily problems, but you may want to review the suggested actions to determine whether updating the codelist would be appropriate for your study.

INFOs highlight things you might be interested to know, but that aren't necessarily problems that require action.

Codelists shown as GOOD have passed all the automated checks. This does not guarantee that they are correct, only that the codelist does not have any of the specific issues that we check for. You should still review all codelists to ensure they are appropriate for your study.

Ultimately, responsibility for codelist selection and interpretation rests with the study authors. This report should be used as a guide to support review, not as a substitute for clinical or methodological judgement.

Codelist summary

This project contains codelists in total.

ERRORNo events

Yearly counts of actual usage of SNOMED, ICD10 and OPCS4 codes are made available by the NHS. None of the codes in this codelist appear in that data for any year. This strongly suggests a problem with the codelist. However there are potentially situations where this would occur:

The codelist is for a sensitive clinical concept. In this case the data may be present in the OpenSAFELY backend, but redacted from the public data
The codelist is for a new concept and is not in the publised data because of the lag between entry and publication
The codelist is specifically designed to capture codes that you do not expect to appear in the data

Codelist	System	Variables using this codelist

WARNINGNewer version available

Newer versions of these codelists are available on OpenCodelists. There may be valid reasons for using an older version, but it is probably worth reviewing the changes in the newer version to check whether they are relevant to your study and whether you should consider updating.

Codelist	Currently used version	Newer version	Diff link	System

WARNINGPotentially missing codes

These codelists were created using the OpenCodelists builder tool, which identifies codes based on keyword searches and hierarchical relationships. All codes returned must be included or excluded - they cannot remain unresolved. When these codelists were created, this was the case, but if you rerun the searches against the most recent version of the coding system then there will be unresolved codes.

You should consider reviewing recent additions to the coding system and decide whether any should be incorporated into the codelist.

Codelist	System	Version built with	Latest version

WARNINGUploaded codelists

These codelists were uploaded as a CSV file to OpenCodelists, rather than being built using the OpenCodelists builder tool. This means they may not have been built using systematic searches of the coding system and may be more likely to be missing relevant codes.

Codelist	System	Variables using this codelist

WARNINGHardcoded codelists

These codelists are defined directly in the study code rather than being imported from OpenCodelists. This makes them harder to review and update, and we can't automatically check them for issues.

You should consider moving these codelists to OpenCodelists and importing them into the study code instead.

Codes	File location	Variables using this codelist

WARNINGOld ethnicity codelist 1

You are using the older version (2e641f61) of the SNOMED ethnicity codelist. This codelist maps ethnicity codes to two groupings - a broad 6 category grouping, and a more granular 16 category grouping. However in that version of the codelist, the groupings are only given as numbers (1-6 and 1-16). The newer version of the codelist (22911876) includes additional columns with human-readable labels. It is recommended to switch to the newer version, to avoid introducing errors mapping the numbers to labels.

NB - there are no differences in the list of clinical codes between the two versions.

WARNINGUnintentional local codelists

These codelists live locally in the study repo, rather than being imported from OpenCodelists. This makes them harder to review and update. They are located inside the codelists/ directory which is against our best practice advice and will cause the GitHub action checks to fail whenever you update your code.

You should consider moving these codelists to OpenCodelists. If you need to keep them as local codelists then you should move them to a local_codelists directory as detailed here.

File location	Variables using this codelist

INFOLocal codelists

These codelists live locally in the study repo, rather than being imported from OpenCodelists. This makes them harder to review and update. However, they are not placed directly in the codelists/ directory so will not interfere with the opensafely codelist update command as detailed here.

File location	Variables using this codelist

INFOUnused codelists

These codelists are defined in your codelists.json file but do not appear in any ehrQL variables. They have not been assessed for problems.

Codelist	System

GOODGood codelists

These codelists passed all automated checks. It does not mean they are ok, merely that they don't have any of the specific issues that we check for.

Codelist	System
bmi-measurements	SNOMED
sex	CTV3

How it works

The OpenSAFELY Codelist Audit Tool works in several stages which pull in data about your study and its codelists. It is currently run as a batch process - the date of the last run is shown at the top of this report.

The first pulls in all the code for your study from its GitHub repository, finds which codelists are included in your study's codelist.txt file then interprets the ehrQL to find out if and where they are used in variables. This process works well with the vast majority of ehrQL code, but there may be certain studies for which it fails to correctly detect codelists and how they are used in dataset definitions.

Next it looks for clinical events coded with the codes contained in your study's codelists in the SNOMED CT and ICD-10/OPCS4 Open Data from NHS England. This is used to find any codelists that don't appear to find any clinical events using exact matching of codes - any found will be shown in the "No events" section of this report.

Next, it pulls metadata from OpenCodelists about the codelists in your study - what versions of the codelist are available besides the one referenced in your study, what release of the underlying coding system was used to create it, with which releases of the coding system it is compatible.

If the tool finds that there are newer versions of the a codelist in your study published on OpenCodelists, this is included in the "Newer version available" section of this report. Newer versions that are "Draft" or "Under Review" are not included here.

OpenCodelists calculates what versions of codelists are compatible with which releases of coding systems. It does this firstly by looking to see if a new release of a coding system contains new codes that would be returned as new search results for searches used to build a codelist version. Secondly, it looks to see if the new coding system release contains any new codes that are descendants of codes included in the codelist version.

If a codelist is not compatible with the latest release of a coding system, this means that there are new candidate codes for the searches or hierarchy of the codelist that are missing from the codelist. This is highlighted in the "Potentially missing codes" section of this report.

Codelist Quality Report

Select a Repository

Codelist summary