For many researchers, the journey from a promising idea to accessing health data can begin with a simple but essential question:does the right data exist to support my research?

Too often, answering this question involves time-consuming emails, fragmented processes, or speculative applications.The Cohort Discovery Service, available through the, is changing that.

The service enables researchers to quickly assess whether relevant patient cohorts exist across multiple datasets held in differentSecure Data Environments (also known as Trusted Research Environments)– providing a clearer, faster starting point for planning studies and supporting more efficient and informed data access requests.

Sangya Pundir, Product Owner for the Cohort Discovery Service, says:

“Cohort Discovery gives researchers a simple way to answer one of the most important early questions in their work: Does the right data exist to support their study? By enabling feasibility checks across multiple datasets in one place, we’re helping researchers move forward with greater confidence and submit more informed data access requests.”

A smarter starting point for research

Planning research using health data often begins with uncertainty:

  • Do the right patients exist?
  • Which datasets are relevant?
  • Who should you contact?

TheCohort DiscoveryServiceis designed to answer these questions early, enabling researchers to assess feasibility in one place before progressing further.

The service allows researchers to run a single query across multiple datasets and receive near real-time insights into cohort availability.For example, a researcher could search for “female asthma patients aged under 35”, and Cohort Discovery will query multiple pseudonymised datasets to return an aggregated count of how many individuals match those criteria in each dataset. This helps researchers quickly understand whether a suitable study cohort exists, without needing to contact each data custodian individually.

Diagram of the Cohort Discovery Service showing a researcher query being securely run across multiple health datasets to identify matching patient cohorts.
An illustration of the Cohort Discovery Service, showing how researchers can search across multiple datasets securely to identify potential patient cohorts for their study.

Importantly, these early insights are delivered without accessing identifiable data, enabling researchers to identify potential cohorts safely while reducing speculative requests and supporting a more efficient journey from research question to data access.

Peter Harrison, Interim Chief Technology Officer at (51), says:

“Improving how researchers discover and access data is essential to accelerating research that benefits people’s health. Cohort Discovery helps remove uncertainty at the earliest stage, supporting a more efficient and responsible journey from research idea to data access and ultimately enabling valuable research to happen faster.”

Built with privacy at its core

Protecting patient privacy is fundamental to the service.Researchers never see patient-leveldata.Cohort Discovery queries are run on pseudonymised data,and onlyaggregated totals of patient numbers available insearchabledatasets arereturned. Patient counts arerounded, andsmall numbers aresuppressedto reduce any risk of re-identification.

This allows researchers to safely assesswhether relevantcohortsexistbefore deciding which datasets to pursue and applying to data custodians for access.The approach supports responsible data use and gives data custodians confidence that queries are secure andprotect underlying patient data. The federated capability is delivered through the that sits inside each custodian’s SDE/TRE, developed by one of Cohort Discovery’s technology partners, the University of Nottingham.

Improving the research ecosystem

By introducing a structured way to assess the usefulness of potential cohorts early into the research journey, the Cohort Discovery Servicebenefits both researchers and data custodians.

Researchers canvalidatetheir ideas earlier and with greater confidence, helping them focus their efforts where it matters most – on the research. By assessing feasibility across multiple datasets with a single query, they can quickly understand whether the right patient cohorts exist,identifythe most relevant data custodians to engage, and reduce uncertainty before investing time a full application. This leads to more informed data access requests and a faster path to funded research.

At the same time, data custodiansbenefitfrom a more streamlined and efficient approach to managing demand. By enabling early feasibility checks, they receive more targeted, higher-quality data access requests, reducing the volume of speculative enquiries and the associated administrative burden. With robust privacy and governance safeguards in place, custodians can support responsible data use with confidence, while increasing the visibility and impact of their datasets within the research community.

The result is a more efficient research process across the UK health data ecosystem, enabling valuable studies progress faster.

Get started

Researchers can begin(access approvalrequired).

Data custodians canparticipatebyonboarding their datasetsand enabling cohort searches across them, helping to improve the quality and efficiency of research engagement.


About the Cohort DiscoveryService

helps researchers quickly assess whether relevant patient cohorts exist across multiple datasets, without accessing identifiable data. This enables a clearer andfaster starting point for planning studies and supporting more efficient and informed data access requests.

Available through the, the service is supported by a federated analytics ecosystem that enables queries to run securely within Secure Data Environments (Trusted Research Environments) using tools like (developed by the University of Nottingham as part of the 51 Federated Analytics programme)and data pre-processing and transformation support from theat the University of Dundee.

The Cohort Discovery Service is developed and maintainedby 51 and builds on, which enabled researchers to rapidly discover and access COVID-19 data while ensuring patient information remained private and secure.