CAD: A Breast Cancer Diagnostic Tool Using AWS Architecture

Breast cancer is the most common cancer worldwide and the leading cause of death among women. Early detection and rapid diagnosis are essential to saving lives. Histopathological analysis of tissue by a pathologist remains the definitive method for confirming the presence or absence of the disease and assessing disease progression. However, this process is often tedious and subjective, which can lead to variations in interpretation, even among experienced pathologists. Claranet has conducted, together with partners such as Emoj and others, a call for projects by a research group funded by the Marche Region called ProBreast.

Problems and challenges

Existing computer-aided diagnostic (CAD) tools generally perform only one of several stages of the diagnostic process. To obtain a complete diagnostic report, multiple tools must be used, which is not efficient. The main problems encountered include:

  • Data variability
  • The need to obtain accurate labels for histological images, indicating the presence or absence of cancerous tissue

Solution implemented

To address these issues, an integrated solution was developed, based on a robust AWS architecture for data collection, processing, model training, and results distribution.

  • Data collection and preparation: Anonymised biopsy samples were scanned, producing 300 sets of histological images at different magnification levels (from 1.25x to 40x), manually labeled by physicians. The images are accompanied by detailed clinical information
  • Dataset creation: Spot normalisation and patch extraction techniques were used to create datasets suitable for model training. An automated data pipeline was created, using AWS Glue DataBrew to ensure dataset consistency and AWS SageMaker for data cleaning and preprocessing
  • Model Training: Several deep learning models were trained, including deep convolutional neural networks (DCNN) and eXtreme Gradient Boosting (XGBoost) algorithm. The models were tested and validated with different parameters to optimise their performance

AWS Architecture

  • AWS S3: Storage of histopathology images and clinical data. Data is organised and protected for fast and efficient access
  • AWS Glue DataBrew: Data preparation tool that cleaned and normalised the data. Transformations were automated to ensure consistency across datasets
  • AWS SageMaker: A platform used for training, validating, and deploying deep learning models. SageMaker has simplified the model training process by providing a managed infrastructure for the intensive processing required
  • AWS Lambda: Used to trigger data processing jobs in response to events, making it easier to automate workflows
  • AWS CloudWatch: A monitoring tool used to collect and monitor performance metrics of AWS models and services, enabling proactive maintenance and real-time adjustments
  • AWS IAM: Identity and Access Management, which ensures that only authorised people and services can access sensitive data and AWS resources
  • AWS ECS: Container Management Service used to deploy and manage diagnostic applications in production, ensuring scalability and efficient resource management

Results obtained 

The results obtained with the different models are promising:

  • The classification of histological images, by applying the VGG16 model pre-trained on ImageNet, achieved an accuracy of 87.6%
  • Cancer grade prediction using the XGBoost algorithm achieved 95% accuracy
  • Prediction of 10-year cancer recurrence using linear regression achieved 71% accuracy

Conclusions

The developed CAD tool combines histological image analysis with clinical and histological data analysis to generate a comprehensive diagnostic and prognostic report for breast cancer. This tool has the potential to reduce the workload of clinicians and improve the reproducibility of diagnoses. The AWS architecture has enabled the implementation of a scalable, secure, and high-performance solution, facilitating the implementation and exploitation of deep learning models. Further studies are needed to improve the robustness and accuracy of the models, especially by acquiring more labeled data and experimenting with different deep learning approaches.

Find out more about how data can help enable your business, and our other Data and AI solutions tailored to your business needs.