Researchers, Tools and Methods for the Phenome

Enormous amounts of data are generated during clinical interactions across multiple-healthcare settings in the form of structured and unstructured EHRs. The data contains rich, longitudinal information on diagnoses, symptoms, medications and��tests which��can be used for research.��However, EHR data is not primarily generated��for research purposes; is stored in disparate sources often using different formats and requires a significant amount of pre-processing.

Our Phenotype Library

The UK油has established a油,油敬鞄庄界鞄油庄壊油one of the油鉛温姻乙艶壊岳油in the world.油 It is the only national wholly油open-access library of reproducible phenotyping algorithms for defining human disease, lifestyle油risk factors and biomarkers油using diverse electronic health records.油For each phenotype, the library curates油its油metadata, implementation details, programmatic油code油and validation information. The油Library油enables reproducible and transparent research using such complex data by the wider research and clinical community.油油油

Researchers hoping to unlock the valuable data contained with EHRs need to spend油considerable time creating the coding needed油to work with data that often contains油inconsistencies and is of varying油quality油and油detail.油

The 51卯創利永鞄艶稼看岳霞沿艶油Library has油been created to assist researchers working with EHRs, by creating an open油access national library of validated phenotyping algorithms,油definitions油and油methods. Routine use of the library油by researchers油will cut down on the duplication of effort by allowing re-use of algorithms, tools and methods油and油will油ensure reproducibility of research by creating a national standard for creating,油evaluating油and representing phenotypes.油油

Are you a researcher that has developed a phenotyping algorithm油that:油

defines a disease,油risk factor or biomarker,油油
derives油information from one or more EHR sources,油油
is associated with one or more peer-reviewed output and油油
is油already油validated?油

You油can contribute to the improvement of油health by depositing your algorithms in the Phenotype Library油enabling their dissemination, re-use, evaluation, and citation油to the benefit of the emerging油phonemics油research油community.油

The phenome national priority is developing tools油which油will油support油the油definition油and creation of computable phenotypes, which can be used to interrogate EHR data油to enable health油research油for patients benefit.油

is a phenotype definition model, which can be used to define phenotypes from EHRs and export them �� this allows phenotypes definitions to be re-used across research institutes improving reproducibility. Over 300 phenotypes are currently downloadable from Phenoflow油and油can be instantly used to interrogate local datasets.油Phenoflow油also allows researchers to author new phenotypes and enables their validation against multiple data sources.油

National油Medical油Text Analytics油

The 51卯創利 Text Analytics Resource is the UK��s first油repository of tools,油methods油and datasets for油natural language processing油(NLP)油of油the unstructured free text contained within electronic health records.油The油resource will油help油油the油clinical and research community to油unlock the rich data contained within electronic health records to deliver improvements in healthcare.油

There is much value in the油information油included in EHRs,油e.g.油symptoms, tests, investigations, diagnosis, and treatments,油which could help researchers and clinicians油learn how to tailor treatments more accurately for individual patients and to offer better and safer healthcare.油However,油most of the information held within these records is in written form �� sometimes referred to as unstructured text �� which is difficult to use in research油and is currently under-used for research.油油

To access the data held with unstructured text we need to develop special computerised tools to process these words to ensure we have a full picture of all patient symptoms, experiences and diagnoses to use in research for patient benefit. The 51卯創利 Text Analytics Resources is building a油NLP research community that will address the complexity of clinical text through development of shared tools and standards.油油

A curated list of applications and datasets for healthcare text analytics can be found on油HDR UK Text��s油github油��resources�� repository, you can find some examples of these below:油油油

油

Cogstack allows the extraction of information from unstructured data (e.g.油PDF/MS Word documents, images) contained within Electronic Health Records (EHRs). This data, which is usually inaccessible, once extracted and processed via油CogStack can then be analysed in multiple ways.
MedCAT is a natural Language Processing tool which can be used to link the extracted EHR data to definitions of disease to answer research questions such as ��the油relationship between diseases and age?�� Over twelve million free text documents and over 250 million diagnostic results and reports have been processed within油CogStack, which is being implemented across��three NHS Foundation Trusts (South London and Maudsley, King��s College Hospital, and University College London Hospitals).油CogStack油was cited in the油Secretary of State for Health and Social Care��s speech ��Better tech: not a ��nice to have�� but vital to have for the NHS�� (January 2020) and NHSX��s report ��Artificial Intelligence: How to get it right�� (October 2019).油
FMA allows the extraction of information including causes of death and other diagnoses from free text in EHRs. The algorithm makes use of Read Clinical Codes, whereby clinical terms are designated with code e.g. ��Asthma�� = ��H33..��, and the earlier iteration OXMIS (OXford油Medical Information System) Code, to identify ��medical�� words within the text. FMA facilitates research using free text in EHRs (e.g.油those deposited in the UK General Practice Research Database), reducing the need for manual analysis.油油

Use our NLP油resources,油applications油and datasets油to .

Smartphones and wearable devices

The油mHealth toolbox油will enable researchers to rapidly spin up population level remote monitoring studies with data streams including active data (e.g.油questionnaires油and clinical assessments) as well passively generated data from smartphones and wearable devices linked to other data modalities such as EHRs.油Using reproducible methods to analyse mHealth generated data researchers will be able to better understand the causes and consequences of disease.��油

Our mHealth community is developing open access tools and software which will support researchers undertaking studies using health data collected via smartphones and wearables, for example:油油

RADAR-base is a remote data collection platform that enables health data collected from study participants via wearables and mobile technologies to be shared with and used by clinicians and researchers. The油platform supports油study design and set up, active (e.g.油the use of questionnaires) and passive (e.g.油real time monitoring of movement) remote data collection and secure data transmission to the research/clinical team.油油
油BiobankAccelerometerAnalysis is a tool to extract health information from large accelerometer datasets (usually captured via a wrist worn device that measures acceleration i.e.油a person��s activity). The software generates time-series and summary metrics useful for answering key questions such as how much time is spent in sleep, sedentary behaviour, or doing physical activity and its health consequences.油

Case Studies

Case study

BREATHE is enabling the use of Electronic Health Records in respiratory research

19 April 2021

Demonstrating油51卯創利s油vision油to unite油the UK��s health data to enable discoveries that improve people��s lives,油BREATHE油�� The Health Data Research Hub for...

Case study

Analysis of text written by doctors in medical notes of patients with COVID-19 (National Text Analytics project – ACE inhibitors)

14 April 2020

Analysis of text written by doctors is being used to find and extract patterns and hidden nuances within medical notes of those who have tested positive for COVID-19.

Case study

Association between physical activity and cardiovascular disease is much stronger than previously thought

19 April 2021

The physical activity of油90,000+ participants, in the UK Biobank,油was油measured油by油a wrist-worn accelerometer (motion sensor) over a one-week period between 2013-2015.油During a five-year...

Get involved油

To find out more and to get involved, contact with油Serina Hayes, Phenomics Programme Director, Spiros油Denaxas,油Phenomics National Resource Lead, Richard Dobson油or油Angus油Roberts,油Text Analytics油National Resource油Co-Leads.