Computable phenotypes and codelists - British Heart Foundation

Robust, reproducible, and openly available phenotype definitions for health data research

High-quality research using health data depends on clear, consistent definitions of clinical concepts. These definitions – often referred to as computable phenotypes – translate real-world clinical conditions into computational code that can be applied to electronic health records (EHR), enabling researchers to identify and study patient cohorts at scale.

We support researchers to develop, use, and share computable phenotypes through practical guidance, tools, and pipelines. We also provide access to a growing library of openly available, quality-assured phenotype definitions to promote transparency, reproducibility, and reuse across the research community.

Access computable phenotypes

All phenotype definitions we have developed or used in research we support are shared via the BHF Data Science Centre collection in the HDR UK Phenotype Library and linked project Github repositories. This ensures that they are FAIR (Findable, Accessible, Interoperable, and Reusable).

Expertly curated computable phenotypes

We develop community-agreed, validated phenotype definitions for key cardiovascular and related conditions. Each phenotype definition includes the code needed to run the algorithm, codelists, a detailed ReadMe and the algorithm flowchart.

Expertly curated computable phenotypes are currently available for diabetes, chronic kidney disease and myocardial infarction, with more coming soon.

Code terminology resources

We offer a practical collection of resources to support the use of frequently used clinical coding terminologies in research. This includes lookup tables for finding and checking codes, mapping tables for translating codelists between terminologies, and additional tools and packages to support terminology exploration and codelist development. These resources are intended to save time, improve consistency, and make working with clinical codes more straightforward across projects.

Codelist comparison tool

Our Codelist Comparison Tool supports the comparison and creation of codelists, with features including:

mapping of medical codes to descriptions
comparison to the GDPPR refset
mapping codes to the number of people in the NHS England SDE with each code recorded, offering insights into prevalence
building and export of codelists for use in project pipelines
support for codelists from external libraries (e.g. HDR UK Phenotype Libraryand OpenCodelists) and user-generated codelists

For an overview of how to use the Codelist Comparison Tool please watch the video below: [Video placeholder]

Benchmarking

We are developing approaches to benchmark phenotype definitions across datasets and methods. This will support users in understanding variation in performance, improving transparency, and selecting appropriate definitions for their research questions.

Sharing

We encourage all developers and users of computable phenotypes to share them openly and freely e.g. via the HDR UK Phenotype Library, with metadata and code to support re-use.

To support FAIR sharing of phenotype definitions, we have developed guidelines and tools.

Citing

Computable phenotypes should be appropriately cited in publications, with sufficient information provided to support reproducibility. The digital object identifier (DOI) or accession identifier (ID) for each phenotype should be included. We recommend listing all computable phenotypes in a table, including the phenotype name, repository, and DOI or accession ID.

Custom services

We offer support for developing, reviewing, and implementing computable phenotypes, including bespoke codelist development, validation, and methodological advice tailored to specific research needs.