Curated assets

Start analysis faster with reusable, analysis-ready code

Common research data, prepared once and reused across projects. Curated assets provide harmonised, high-quality well-documented, analysis-ready tables built from electronic health record datasets. They help research teams work more efficiently by removing the need to repeat the same data preparation in every project, while enabling more comparable and robust studies.  Our curated assets are driven by community needs, developed in collaboration with research teams and data custodians.

Curated assets support with

  • Reusable analysis-ready tables
  • Harmonised variable definitions
  • Built-in quality checks
  • Faster project setup
  • Consistent data across studies

[Testimonal- placeholder]

Example: Curated Data Asset for Demographics

The curated data asset for demographics provides a single, consistent table of core demographic variables for use across research projects.

It harmonises and concatenates key patient characteristics from multiple datasets to improve data completeness and quality.

Selection algorithms are applied to determine the most appropriate value for each characteristic for every individual. With variable definitions aligned with the latest research from the Consortium.

The methodology and code are shared openly through our Github documentation pages and repository.

Instead of each project assembling these variables separately, the curated asset combines multiple underlying datasets to produce a standardised set of characteristics for every patient, including:

  • Age (month and year of birth)
  • Sex
  • Ethnicity
  • Deprivation
  • Geographic region
  • Date of death

This asset:

  • Combines multiple source datasets: primary care, hospital, and death data are brought together
  • Applies consistent definitions: age, ethnicity, and deprivation derived using uniform rules
  • Improves completeness: multiple sources are used to maximise data coverage
  • Provides a reliable starting point: consistent demographic variables improve comparability across projects

Most research projects need to spend time locating, joining and cleaning demographic data before analysis can begin. The Curated Data Asset for Demographics removes this step, allowing projects to start analysis immediately with consistent variables.

Also available as code

Curated assets are developed within the secure research environments used by the programmes we support. Where possible, shared curated assets are made available directly within the environment.

However, some programmes (for example BCDC-SAIL) cannot distribute shared datasets. In these cases, we provide efficient, reusable code that generates the curated tables within each project workspace using the datasets available to that project.

Some curation work is therefore shared as reusable code rather than a published data product.

Our curated assets are:

  • Developed within secure research environments in response to project requirements
  • Version-controlled
  • Documented with clear READMEs
  • Designed to be reusable across projects

 

Links

Curated data assets developed within the NHS England SDE for the CVD-COVID-UK/COVID-IMPACT programme are available through our public GitHub repository.

Talk to us

Interested in reusing or adapting curated assets for your project? Come and talk to us[link to form]. If you have suggestions for other curated assets that would benefit researchers, please let us know.

Quick links:
NHS England SDE – curated data assets:
COVID-19 positive
Demographics
HES APC Diagnoses
HES APC Procedures
Deaths