Research Database of Sequential Mammography Screening events: a resource for AI in breast cancer
Session type: Proffered paper sessions
Theme: Diagnosis and therapy
With the advent of digital imaging modalities and the rapid growth in both diagnostic and therapeutic imaging, the ability tobe able to harness this large influx of data is of paramount. The OPTIMAM Medical Image Database (OMI-DB) was created to provide a centralised, fully annotated dataset for research. Collection has been on going for over three years, providing the opportunity to collect sequential/temporal-imaging events. These data coupled with Quantitative Imaging Features (QIFs) provide a powerful resource for future research in particular the investigations into risk profiling for woman attending screening. This paper describes, extensions to the OMI-DB collection systems and tools and discusses the prospective applications of having such a rich dataset for future research applications.
The database contains unprocessed and processed images, associated data and expert-determined ground truths. The process of collection, annotation and storage is fully automated and adaptable and has been described elsewhere1, however Extensive alterations to the identification, collection, processing and storage arms of the system have been undertaken to support the introduction of sequential events, including interval cancers. Furthermore, An automated feature extraction framework has been developed which can process images from the OMI-DB and extract QIFs, calculate breast density and CAD features, which are subsequently stored in a database.
At present we have collected 9566 patient cases, where 1777 are normal, 7345 malignant and 435 benign. In total over two-thirds of the dataset has sequential-screening events collected. Furthermore, QIFs, Breast Density and CAD features are calculated for each image resulting in over 10 million data points.
At present we have collected 5366 patient cases, of which 680 are normal, 4221 malignant and 435 benign. In total over two-thirds of the dataset has sequential-screening events collected. Furthermore, QIFs, Breast Density and CAD features are calculated for each image resulting in over 10 million data points.