Robust, reproducible, and openly available phenotype definitions for health data research
High-quality research using health data depends on clear, consistent definitions of clinical concepts. These definitions – often referred to as computable phenotypes – translate real-world clinical conditions into computational code that can be applied to electronic health records (EHR), enabling researchers to identify and study patient cohorts at scale.
We support researchers to develop, use, and share computable phenotypes through practical guidance, tools, and pipelines. We also provide access to a growing library of openly available, quality-assured phenotype definitions to promote transparency, reproducibility, and reuse across the research community.
Access computable phenotypes
All phenotype definitions we have developed or used in research we support are shared via the BHF Data Science Centre collection in the HDR UK Phenotype Library and linked project Github repositories. This ensures that they are FAIR (Findable, Accessible, Interoperable, and Reusable).
Expertly curated computable phenotypes
We develop community-agreed, validated phenotype definitions for key cardiovascular and related conditions. Each phenotype definition includes the code needed to run the algorithm, codelists, a detailed ReadMe and the algorithm flowchart.
Expertly curated computable phenotypes are currently available for diabetes, chronic kidney disease and myocardial infarction, with more coming soon.
Code terminology resources
We offer a practical collection of resources to support the use of frequently used clinical coding terminologies in research. This includes lookup tables for finding and checking codes, mapping tables for translating codelists between terminologies, and additional tools and packages to support terminology exploration and codelist development. These resources are intended to save time, improve consistency, and make working with clinical codes more straightforward across projects.
Codelist comparison tool
Our Codelist Comparison Tool supports the comparison and creation of codelists, with features including:
- mapping of medical codes to descriptions
- comparison to the GDPPR refset
- mapping codes to the number of people in the NHS England SDE with each code recorded, offering insights into prevalence
- building and export of codelists for use in project pipelines
- support for codelists from external libraries (e.g. HDR UK Phenotype Libraryand OpenCodelists) and user-generated codelists
For an overview of how to use the Codelist Comparison Tool please watch the video below: [Video placeholder]
Benchmarking
We are developing approaches to benchmark phenotype definitions across datasets and methods. This will support users in understanding variation in performance, improving transparency, and selecting appropriate definitions for their research questions.
Sharing
We encourage all developers and users of computable phenotypes to share them openly and freely e.g. via the HDR UK Phenotype Library, with metadata and code to support re-use.
To support FAIR sharing of phenotype definitions, we have developed guidelines and tools.
Citing
Computable phenotypes should be appropriately cited in publications, with sufficient information provided to support reproducibility. The digital object identifier (DOI) or accession identifier (ID) for each phenotype should be included. We recommend listing all computable phenotypes in a table, including the phenotype name, repository, and DOI or accession ID.
Custom services
We offer support for developing, reviewing, and implementing computable phenotypes, including bespoke codelist development, validation, and methodological advice tailored to specific research needs.