• About
    • Context
    • Detail
  • Meet the Team
  • Our Partners
  • News & Blogs
  • Timeline
  • Contact Us
Menu
  • About
    • Context
    • Detail
  • Meet the Team
  • Our Partners
  • News & Blogs
  • Timeline
  • Contact Us
Twitter

So you want to build a cohort?

Home » So you want to build a cohort?

  • Written by Thomas Nind
  • 17th June 2021
  • Blog
  • cohort

So you want to build a cohort?

So you’ve decided to do a research study using Electronic Health Records and/or imaging and you’ve written a document outlining the requirements.  How do we turn this list of inclusion / exclusion criteria into runnable code?  With RDMP‘s Cohort Compiler of course!

The first task is to split up the criteria into bite sized chunks, each run on a single dataset:

  1. 3+ prescriptions for Drug A
  2. Biochemistry result for TestCode B > 500
  3. Alive at the time of study
  4. Has had a head MR in the past 5 years

How does RDMP compile this into SQL? To answer that question lets look at the end goal.  Since the datasets share a common identifier we can JOIN the tables.  But that can get complex fast and gives us a single gigantic query that’s likely to bring the server to it’s knees.  Instead, since we are dealing with lists of patients, we can use SET operations (UNION, INTERSECT, EXCEPT).  This means we only need to pull a single column (e.g. patientId) from each dataset and we can then smash all the resulting lists together using the super fast operations that Relational Database Engines excel at.  As an added bonus, if the datasets are on seperate database servers or engines (MySql, Sql Server, Oracle) we can run the queries seperately and store the results in a temporary common server and apply the SET operations there.

SELECT patientId From Prescribing WHERE Drug = ‘Drug A’ Group by CHI HAVING COUNT(*) > 3

UNION

SELECT patientId From Biochemistry WHERE TestCode = ‘TestCode B’ AND Result > 500

EXCEPT

Select patientId from Demography WHERE DateOfDeath is not null

INTERSECT

SELECT patientId from Imaging WHERE Modality = ‘MR’ and StudyDescription like ‘%head%’

Since each section is runnable independently it is trivially easy for RDMP to produce totals for each seperate set.  The set results can even be cached to prevent having to re-run the entire query if you are only making a small change to one bit.

And that’s about it! RDMP, it’s free, it’s open source, cross platform and it even runs in SSH terminals!

PICTURES white logo

InterdisciPlInary Collaboration for efficienT and effective Use of clinical images in big data healthcare RESearch

 

TwitterEnvelope

General

  • Home
  • News
  • Join the Team
  • Contact

About Us

  • About
  • Using our Services
  • Timeline
  • Our Partners

Legal

  • Sitemap
  • Terms and Conditions
  • Privacy Policy
  • Cookies Policy

Copyright © 2020 Pictures. All rights reserved. Company Registration No: SC015096

Designed & Developed by mtc.