Published on 27/06/2022
The INCISIVE project has reached its first major milestone after 18 months of work by launching its first prototype of an interoperable federated data repository of thousands of clinical images of breast, lung, prostate, and colorectal cancer. The repository allows the secure and GDPR-compliant sharing of health data by hospitals and other potential data providers with the scientific community working on Artificial Intelligence (AI) related training and experimentation.
The first prototype federates 3 out of the 9 INCISIVE data providers together with all their cancer imaging and other clinical data, which is a fraction of the total data that the project plans to federate. For this purpose, the consortium has collected in a temporary central storage a total of 2.5 million cancer images for more than 7,000 de-identified patients from all the clinical partners involved in the project. The collected data is ready to be integrated into the federated data repository as soon as the remaining 6 data providers complete the set-up of their data nodes.
The project’s coordinator, Gianna Tsakou, Senior Project Manager at MAGGIOLI SpA – Research & Innovation Lab in Athens, highlights the challenges that the consortium has faced to achieve this milestone: “one of the biggest challenges that we successfully addressed was putting in place all the necessary agreements and technical work for ensuring that the massive retrospective data sharing complies with legal and ethical requirements in all 5 European countries and for all 9 data providers where data nodes are planned”. Other challenging tasks have been data collection, preparation and de-identification, establishing a common understanding of the AI services that the project will deliver, designing the platform and implementing an operational version of the federated approach in terms of data storage and federated learning.
The first INCISIVE prototype already includes some of the functionalities expected for the AI Toolbox, which aims to provide decision-making support to medical professionals regarding cancer diagnosis and treatment. INCISIVE partners have already started working on almost all AI models targeted in the project, and the first prototype incorporates those more advanced, namely the models for breast density classification and lung image segmentation for several image modalities.
The first prototype AI toolbox also includes initial approaches on explainable AI, data analysis pipelines that will enable the delivery of the planned AI services, as well as initial work on the User Interface of the AI services so that medical professionals can comprehensively view and read the AI inference results in a way that is as intuitive and transparent as possible.
The first prototype comprises all the main use cases and platform functionalities foreseen for the project for the potential users of the INCISIVE platform.
Firstly, for the data providers, it supports data preparation, including data de-dentification, annotation and quality checking before sharing their data in the INCISIVE repository. Secondly, for the AI researchers looking for training or validation data for their models, the first prototype supports searching and querying of the data in the federated nodes, allowing the creation of a workspace and the training of their algorithms using federated learning. Finally, for the medical professionals, it supports the delivery of AI-enabled inference services following a models-as-a-service approach, where medical professionals must only provide the image to the system and then get the AI-enabled inference results with only one click.
The INCISIVE consortium has started working on the second prototype, which will integrate the remaining 6 data providers into the federated storage and make their data interoperable and reusable during and after the project. The project also expects to include a central data storage node in the integrated platform for those data providers who cannot or do not wish to set up their own node locally and to make available a data de-identification tool that is optimized according to the data providers’ needs, as well as a semi-automatic data annotation tool to accelerate the work of data partners.
The second prototype, which will be available early next year, will also optimize the federated learning process in terms of the usage of computational resources required and the quality of AI models produced from this process.