PICTURES stands for InterdisciPlInary Collaboration for efficienT and effective Use of clinical images in big health care RESearch. It is a 5-year, £3.8M programme of work funded by the Medical Research Council (MRC) with additional support from the Engineering and Physical Sciences Research Council (EPSRC) as part of Health Data Research UK (HDR UK). The programme kicked-off in August 2019 as an evolutionary step in a long line of projects aimed at transforming health research in the UK by providing safe and secure resources for cutting-edge computational analysis on dynamic, multidimensional health-relevant data. This is in support of HDR UK’s vision that “every health and care interaction and research endeavour will be enhanced by access to large scale data and advanced analytics”.
- A 2013 Farr Institute funded research project, originally named the Creation of a National Clinical Imaging Research Dataset for Scotland, was the first concrete step in establishing a Scottish Medical Imaging (SMI) capability and led to the National Safe Haven obtaining a research copy of all clinical images taken in Scotland between 2010-2018 (approx. 3 petabytes). Those images have since been used to develop software for successfully converting unstructured image data into useful, anonymised metadata ‘tags’ and cataloguing them for cohort building, which is creating a subset of relevant data for health research. The tags are used by data controllers to find and collate pseudonymised-images that meet individual researchers’ needs.
PICTURES is now building on that early work to enhance the SMI Framework. Initially, the metadata catalogue will be populated from DICOM tags attached to the images, but in future they will be derived from the radiologist’s opinion of the image and even the images themselves through Natural Language Processing and pixel data algorithms. The SMI Framework is being developed and tested across two Safe Havens – the National Safe Haven (NSH) run by Public Health Scotland and the Regional Safe Haven run by University of Dundee’s Health Informatics Centre (HIC) – to be used as a template by others. For more information and links to project partners go to Our Partners.
The first datasets prepared for cohort building are three years’ worth of CT, MRI and PET scans (2015-2017). These scans were taken on different devices in different hospitals by different people and the DICOM tags attached to them are inconsistent, sometimes wrong and often missing entirely. This makes it very difficult to sort them and label them in a consistent way so cohort builders know what to look for and can be confident in the search results. A lot of manual checking goes into refining the ‘rules’ that ensure proper automated handling of these issues by the software. Extensive testing of results is done by cohort builders in this project so they understand and can control the risk of identifiable data slipping through to the research study space. A major component of the pipeline used by SMI is the open-source program developed by HIC researchers called the Research Data Management Platform (RDMP).
The software developed for SMI and PICTURES are open-source wherever possible, and can be found on GitHub at https://github.com/SMI/Home. Contributions are welcome and appreciated. Please let us know via Contact Us if you are using any of our software for your own project.
The first study spaces prepared for researchers contain simple tools like R for data science programming and microDICOM for viewing and manipulating images. Over time, as researchers request other tools, these will be assessed, made safe and added to the service. When deciding how to expand the service, researcher requests will be balanced by the need to keep patient data safe and minimise the risk of data leakage. The first code ingress (input) mechanism under development is to enable one-way flow of code from a defined external environment with automated handling such as malware checking before release into a researcher’s study space in the Safe Haven.
The first code egress (output) mechanism under development is disclosure control through tools designed to check for row level data, differencing, and reidentification from AI model weights. The project team is also looking at options like differential privacy (adding ‘noise’ to the data) and redaction (removing information from the data) and the relative impacts vs benefits of each.
As a Research and Development project, PICTURES is led by the University of Dundee, in partnership with the University of Edinburgh, Abertay University, NHS Scotland, and industry collaborators. You can contact the project team with questions or feedback on our website here.