Deep learning empowered breast cancer diagnosis: Advancements in detection and classification

PMC (Open Access) 2023 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-3
Why Mammography-Based Breast Cancer Detection Still Needs Better AI

Breast cancer remains one of the most significant global health challenges, with an estimated 300,590 new cases and 43,170 deaths projected in the United States in 2023 alone. The disease is especially prevalent in Asia due to a combination of lifestyle factors, genetics, environmental exposures, and disparities in healthcare accessibility. Early detection through mammography screening is widely regarded as the most effective strategy for reducing mortality, as it can identify tumors before they spread to other body parts or healthy tissues. However, mammography is not perfect: its accuracy can vary substantially depending on breast composition, tissue density, and tumor characteristics, which can lead to missed cancers or false alarms.

The diagnostic challenge: Radiologists examine mammograms daily to spot problematic lesions and evaluate suspicious breast tissue based on location, traits, and shape. Two standard views are used: MLO (mediolateral oblique) and CC (craniocaudal), which together capture comprehensive breast tissue images. Radiologists look for abnormal areas with increased brightness, changes in breast size, and fatty tissue density. They pay particular attention to dense, white tumor masses, as malignant tumors can change in shape. However, this manual review process is both costly and error-prone, and the increasing volume of mammograms examined daily only amplifies the need for greater accuracy and reliability.

Types of lesions: Mammography can identify several types of breast abnormalities. Mass lesions, which can be benign or malignant, are the most common finding. Calcifications appear as white patches and dots in mammograms and can sometimes be associated with ductal carcinoma in situ, though they are typically benign. Macro-calcifications appear as clear specks, while micro-calcifications, despite their small size, warrant closer attention due to their clinical significance. Architectural distortion, characterized by deformations in breast tissue without a visible tumor, is a common benign condition but can be a precursor to breast cancer and is particularly challenging to detect in 2D mammography.

The case for Computer-Aided Diagnosis: Computer-Aided Diagnosis (CAD) systems can offer a "second opinion," assisting professionals in determining the possibility of breast cancer. Traditional methods relied on straightforward image processing and hand-crafted features, but cutting-edge deep learning algorithms are emerging as substitutes due to the deteriorating accuracy and high false positive rates of conventional approaches. These newer algorithms incorporate background tissue information and automate feature extraction for tumor delineation and classification, addressing the fundamental limitations of older methods.

TL;DR: With over 300,000 new U.S. breast cancer cases projected in 2023, mammography remains the frontline screening tool but suffers from accuracy limitations related to breast density and tissue characteristics. This paper proposes a deep learning CAD system to automate detection, segmentation, and classification of breast lesions from mammograms.
Pages 4-7
What Prior Deep Learning Approaches Achieved and Where They Fell Short

CNN-based detection milestones: The paper surveys a range of prior approaches. Tavakoli et al. developed a block-based CNN architecture for identifying cellular alterations in breast tissues, achieving 95% accuracy on the MIAS database. Moon et al. built a CAD system using CNN architectures with multiple image representations and an image fusion technique, reaching ensemble performance metrics of 91.10% accuracy, 85.14% sensitivity, 95.77% specificity, and an AUC of 0.9697. Khan et al. created a method using three CNN architectures (ResNet, VGGNet, and GoogleNet) combined with data augmentation, achieving 97.525% accuracy on histological images.

Detection-focused studies: Peng et al. combined a multiscale-feature pyramid network with Faster R-CNN, demonstrating true positive rates of 0.94 and 0.96 on the CBIS-DDSM and INbreast datasets respectively. Al-masni et al. developed a YOLO-based CAD system that reached 85.52% accuracy on the DDSM dataset. Haq et al. proposed a DnCNN model that obtained 79% accuracy within a 30-minute processing window. Vedalankar et al. used AlexNet with support vector machines across three databases (CBIS-DDSM, DDSM, and mini-MIAS), achieving a peak accuracy of 92%, sensitivity of 81.5%, and specificity of 90.83%.

Transfer learning and hybrid models: Alruwailia and Gouda used transfer learning with ResNet50 and Nasnet-Mobile on the MIAS dataset, achieving 89.5% with ResNet50 and 70% with Nasnet-Mobile. Das et al. compared shallow CNNs against deep pre-trained models (VGG19, ResNet50, MobileNet-v2, Inception-v3, Xception, Inception-ResNet-v2), finding that pre-trained CNNs achieved accuracy rates of 87.8% and 95.1% on DDSM and INbreast datasets, surpassing the shallow CNN's 80.4% and 89.2%. Trang et al. combined clinical data with mammography images using multiple architectures, achieving 84.5% overall accuracy, a notable improvement from 72.5% using mammography alone.

Identified gaps: The literature review reveals several recurring limitations across prior studies. Many approaches struggle with class imbalance in mammography datasets. Some rely on relatively small databases, limiting generalizability. Others depend heavily on manual tumor delineation by operators, introducing variability in results. The authors conclude that existing CNN-based approaches for breast cancer detection may not be sufficiently accurate or efficient, and that these procedures require more time and resources than desirable for clinical deployment. This motivates the development of a faster, more effective integrated system.

TL;DR: Prior deep learning methods for breast cancer detection ranged from 79% to 97.5% accuracy, but most had notable limitations: reliance on small datasets, sensitivity to class imbalance, manual ROI delineation requirements, and difficulty balancing speed with accuracy. This paper aims to close these gaps with an integrated detection-segmentation-classification pipeline.
Pages 7-9
YOLO-V7 and the Fused Model Strategy for Lesion Detection

Why YOLO for mammography: The YOLO (You Only Look Once) network was chosen as the detection backbone because it predicts both bounding box locations and class probabilities for the entire image in a single pass through a fully convolutional neural network (FCNN). Unlike traditional sliding window approaches that scan the image piece by piece, YOLO divides the image into grids and generates bounding boxes, class probabilities, and confidence ratings for each grid cell simultaneously. This design significantly reduces computational overhead, making it well-suited for clinical environments where processing speed matters.

YOLO-V7 architecture details: The system uses YOLO-V7, the seventh iteration of the YOLO family, specifically designed to improve object detection at various scales. YOLO-V7 employs multi-scale feature extraction using skip connections to address gradient vanishing problems in deeper network layers. Three fully connected layers handle features extracted at different scales. The system uses anchor box theory, fine-tuning anchor boxes with a K-means clustering method applied to whole images. Output matrices of multi-scale features are arranged into grid cells and used alongside these anchor boxes to select boxes with scores above a predetermined threshold, while also computing Intersection over Union (IoU) percentages between ground-truth and anchor boxes.

The fused model approach: Rather than relying on a single model, the authors developed a fusion strategy combining predictions from multiple model configurations. Model-1 (M-1) was trained independently for specific class labels (mass, calcification, or architectural deformation), while Model-2 (M-2) was a multi-class YOLO model trained on all three classes simultaneously. The Fused Model combines M-1 and M-2 outputs using two IoU thresholds: threshold1 (0.44) and threshold2 (0.38). For mass detection, the system starts with M-1 predictions above threshold1, then applies M-2 to images separated by threshold2, and combines both prediction groups into final mass predictions. Calcification predictions follow a similar fusion process.

Data augmentation and normal classification: The training data was split 70% for training, 20% for testing, and 10% for validation. Each pair of mammograms was evaluated together with six augmented versions (rotated or transformed variations), and the image with the best IoU rating was selected. A "Normal" class label was added to account for mammograms that returned normal during follow-up screening, with the YOLO model trained on abnormal mammograms confirming the absence of predicted bounding boxes to classify images as normal.

TL;DR: The detection stage uses YOLO-V7 with a novel fusion strategy: one model trained per lesion type (M-1) and another trained for all classes (M-2) are combined using IoU thresholds of 0.44 and 0.38. This fused approach, enhanced with 6x data augmentation, achieved 98.5% detection accuracy for mass lesions on the CBIS-DDSM dataset.
Pages 9-10
Associated-ResUNets: A Dual-UNet Architecture for Mass Segmentation

UNet foundations: UNet is a widely adopted model in medical image segmentation that uses an encoder-decoder structure, omitting fully connected layers. Its symmetrical architecture consists of down-sampling (encoder) and up-sampling (decoder) paths forming a characteristic "U" shape. The critical innovation of UNet is its skip connections, which preserve spatial information that would otherwise be lost during down-sampling. Each encoder block includes two convolution units followed by batch normalization (BN) and ReLU layers, with max pooling applied before passing output to the next encoder block.

The Associated-ResUNets design: Building on the UNet foundation, the authors introduce "Associated-ResUNets," which joins two complete UNet architectures together with additional skip connections to enhance information flow between them. The first UNet processes the input and generates initial feature maps. Customized skip connections between the first decoder and the second encoder recover decoded information, allowing the second UNet to refine the segmentation output. This dual-architecture approach ensures that fine-grained spatial details from the first pass are preserved and enhanced during the second pass.

Atrous Spatial Pyramid Pooling (ASPP): To facilitate smooth transitions between down-sampling and up-sampling pathways, the model employs an ASPP block. This technique uses "Atrous" (dilated) convolution to widen the receptive field while maintaining spatial resolution. The ASPP block integrates batch normalization layers and four 3x3 convolution layers with varying dilation rates, combining them to generate multi-scale features that are fed into a 1x1 convolutional layer. An attention block is also incorporated to fuse attention mechanisms with skip connections in encoder and decoder blocks, producing an attention map that is multiplied with skip connection information to focus the model on the most relevant regions.

Segmentation pipeline: In practice, the segmentation stage receives ROIs (Regions of Interest) containing breast masses identified by the YOLO detection stage. Bounding box coordinates are expanded to cover more surrounding space around smaller tumors, and the resulting ROI images are resized to 227 x 227 pixels, which empirical research identified as the optimal input dimension for the segmentation networks. The final output mask is produced by a 1x1 convolutional layer followed by sigmoid activation. Only mass lesions are segmented in this study, as calcification lesions lack precise reference annotations.

TL;DR: The segmentation stage uses Associated-ResUNets, a dual-UNet architecture with extra skip connections and an ASPP block for multi-scale feature extraction. This design achieved a Dice score of 95.89% and IoU of 92.28% on the CBIS-DDSM dataset, outperforming standard UNet (89.88% Dice), AUNet (90.26%), and ResUNet (93.59%).
Pages 10-14
BreastNet-SVM: A Customized AlexNet with SVM Classification

Architecture overview: The classification stage uses BreastNet-SVM, a customized model inspired by AlexNet. This 13-layer architecture consists of seven convolutional layers, three pooling layers, and three fully connected layers, designed specifically for breast cancer identification from grayscale mammogram patches. The model accepts input images of size 32x32 pixels. The first two convolutional layers apply 32 filters with 3x3 kernels using same padding and ReLU activation, followed by max-pooling with a 2x2 filter and stride of 2. Two additional convolutional layers use 64 filters each, followed by another max-pooling layer. The final three convolutional layers employ 128 filters each, with a third max-pooling layer reducing the input to a 2048x1 vector.

SVM instead of SoftMax: A key design decision is replacing the standard SoftMax classification layer with a Support Vector Machine (SVM) classifier. While traditional CNNs use fully connected layers ending in SoftMax for classification, the authors found that using SVM on the CNN-extracted features yielded better accuracy for distinguishing between benign and malignant breast cancer. The fully connected layers serve as a bridge between the feature extraction layers and the SVM classifier, with the activation function introducing non-linearity to enable more nuanced classification boundaries.

Optimization and training: Three different optimizers were tested: Stochastic Gradient Descent (SGD), Adaptive Moment Estimation (Adam), and Root Mean Square Propagation (RMSprop), all using a learning rate of 0.0001, 70 batches, and 150 training epochs. Three input image sizes were also evaluated: 16x16, 32x32, and 48x48. The model was trained on a dataset of 6,165 samples (approximately 70% of the augmented CBIS-DDSM data) split into benign and malignant categories, with 30% reserved for validation and testing.

End-to-end pipeline: The complete framework operates in a sequential two-step manner. First, the fused YOLO model detects and categorizes breast masses, producing bounding boxes around relevant areas. Then, the segmented ROI masses from the Associated-ResUNets stage are used as input for the BreastNet-SVM model. This ensures that the classifier receives clean, well-delineated tumor regions rather than raw mammograms with background noise, enabling more precise benign-versus-malignant classification.

TL;DR: BreastNet-SVM is a 13-layer customized AlexNet (7 conv + 3 pooling + 3 FC layers) that replaces the standard SoftMax classifier with an SVM. It takes segmented 32x32 ROI patches as input and was trained with SGD optimizer at a learning rate of 0.0001 for 150 epochs, feeding into the final benign-versus-malignant classification.
Pages 15-17
98.5% Detection Accuracy and 95.89% Segmentation Dice Score

Fused YOLO detection results: The fused model approach demonstrated substantial improvements over individual models. Model-1 achieved 97.9% accuracy for mass detection and 89.9% for calcification, while Model-2 achieved 96.2% for mass and 88.7% for calcification. By combining these models, the Fused Model reached 98.5% accuracy for mass lesion detection and 93.4% for calcification. This fusion strategy effectively addressed the limitations of each individual model, delivering both speed and precision that surpassed existing state-of-the-art methods.

Detection by lesion type: On the test dataset, the system achieved strong performance across all lesion categories. For Normal cases, precision was 0.94, AUC was 0.96, sensitivity was 0.93, recall was 0.94, and accuracy was 0.98. Architectural Distortion achieved the most balanced performance with 0.95 across precision, AUC, sensitivity, and recall, with 0.98 accuracy. Mass lesions showed precision of 0.94, AUC of 0.95, and accuracy of 0.96. Calcification proved the most challenging category with 0.88 precision, 0.94 AUC, and 0.94 accuracy, reflecting the inherent difficulty of detecting calcifications due to their variety of shapes and locations.

Segmentation benchmarks: The Associated-ResUNets architecture consistently outperformed all baseline models on the CBIS-DDSM dataset. Standard UNet achieved a Dice score of 89.88% and IoU of 86.44%. Standard AUNet improved slightly to 90.26% Dice and 88.03% IoU. Standard ResUNet reached 93.59% Dice and 89.80% IoU. The Associated-UNets variant scored 95.73% Dice and 91.96% IoU, while Associated-AUNets reached 95.83% and 92.18%. The full Associated-ResUNets achieved the highest scores: 95.89% Dice and 92.28% IoU, representing a 6-point improvement in Dice score over the standard UNet baseline.

ROC analysis: The ROC curve analysis revealed excellent AUC scores across lesion types: 0.95 for both Architectural Distortion and Mass cases, and 0.96 for Normal cases. These high AUC values indicate strong discriminative ability across different classification thresholds. The relative difficulty with calcification lesions stems from their diverse shapes and locations, as they frequently appear as minor, irregular imperfections that challenge automated detection systems.

TL;DR: The fused YOLO model achieved 98.5% mass detection accuracy (up from 97.9% for M-1 alone), and Associated-ResUNets achieved 95.89% Dice / 92.28% IoU for segmentation, outperforming standard UNet by 6 points. Calcification detection at 93.4% was the most challenging category due to the irregular shapes and small sizes of these lesions.
Pages 17-19
99.16% Classification Accuracy with the SGD Optimizer at 48x48 Input

Optimizer and image size comparison: The study systematically evaluated three optimizers across three input image sizes. During training, SGD consistently outperformed Adam and RMSprop. At 16x16 input size, SGD achieved 98.85% accuracy versus Adam's 95.02% and RMSprop's 93.26%. At 32x32, SGD reached 98.77% compared to Adam's 96.55% and RMSprop's 98.30%. At 48x48, SGD achieved the highest training accuracy of 99.24% with a misclassification rate of just 0.76%, while Adam scored 97.98% and RMSprop scored 97.35%.

Validation performance: During validation on 882 samples, the SGD optimizer with 48x48 input images produced the best results: 99.16% accuracy, 99.30% specificity, 97.13% sensitivity, and a misclassification rate of only 0.84%. This was notably better than the validation performance at other sizes. At 32x32 with SGD, accuracy dropped to 96.03%, and at 16x16 with SGD, accuracy was 93.94%. The Adam optimizer at 48x48 achieved 95.89%, and RMSprop at 48x48 achieved 96.56%, confirming SGD's superiority for this task.

Confusion matrix analysis: During training, the model correctly predicted 2,971 out of 2,990 benign samples (misclassifying only 19) and 3,128 out of 3,175 malignant samples (misclassifying 47). In the validation phase with the best SGD configuration, out of 411 benign samples, 406 were correctly classified with only 5 misclassifications. For 471 malignant samples, 461 were correctly predicted with only 10 misclassifications. This translates to a false negative rate (malignant predicted as benign) of roughly 2.1% during validation, a clinically important metric for cancer screening.

Key takeaway on classification: The 99.16% accuracy achieved by BreastNet-SVM with SGD at 48x48 input represents the lowest misclassification rate (0.84%) observed among comparable studies. The model's sensitivity of 97.13% means it correctly identifies the vast majority of malignant cases, while its specificity of 99.30% ensures very few benign cases are incorrectly flagged as cancerous, which would reduce unnecessary biopsies and patient anxiety in clinical practice.

TL;DR: BreastNet-SVM with SGD optimizer and 48x48 input images achieved 99.16% accuracy, 97.13% sensitivity, and 99.30% specificity on the CBIS-DDSM dataset. In validation, only 5 out of 411 benign and 10 out of 471 malignant samples were misclassified, yielding the lowest misclassification rate (0.84%) among comparable studies.
Pages 19-21
How the Proposed Framework Compares to 8 State-of-the-Art Methods

Detection comparison: When benchmarked against 8 prior methods for mass lesion detection on the CBIS-DDSM dataset, the fused YOLO model's 98.5% accuracy outperformed all competitors. The closest rival was Elkorany et al. (2023), who combined CNN architectures (Inception-V3, ResNet50, AlexNet) with a multiclass SVM, reaching 94.78% on the MIAS dataset. Peng et al. (2020) achieved 93.45% with Faster R-CNN on CBIS-DDSM. Vedalankar et al. (2021) reached 92% using AlexNet + SVM on DDSM. Older approaches like Al-masni et al.'s original YOLO (2017) scored 85.25%, and Das et al.'s shallow CNN (2023) achieved only 80.4%.

Classification comparison: Against 8 published classification methods, the BreastNet-SVM's 99.16% accuracy on CBIS-DDSM surpassed all competitors. Saber et al. (2021) achieved 98.96% using VGG-16 on the MIAS dataset. Khan et al. (2019) reached 97.52% using GoogLeNet, VGGNet, and ResNet on hospital data from Pakistan. Wang et al. (2021) obtained 96.5% with a Boosted EfficientNet model. Hekal et al. (2021) achieved 93.2% using AlexNet and ResNet-50 on CBIS-DDSM. Rahman et al. (2023) scored 93% with ResNet50 on INbreast. Moon et al. (2020) achieved 91.1% with an ensemble method on the BUSI dataset.

Dataset considerations: An important caveat is that direct comparisons across studies are complicated by differences in datasets. The CBIS-DDSM, DDSM, MIAS, INbreast, and BUSI datasets vary in image quality, resolution, annotation standards, and patient demographics. The INbreast dataset generally has improved mammography quality, which can inflate performance metrics, while the CBIS-DDSM dataset used in this study is one of the largest and most standardized benchmarks. The consistent use of CBIS-DDSM across the entire proposed pipeline (detection, segmentation, and classification) strengthens the internal validity of the results.

Integrated framework advantage: What distinguishes this work from most competitors is the integrated three-phase approach. Most prior studies addressed only one task (detection, segmentation, or classification) in isolation. By combining fused YOLO detection (98.5%), Associated-ResUNets segmentation (95.89% Dice), and BreastNet-SVM classification (99.16%) into a single end-to-end pipeline, the system provides comprehensive diagnostic support that covers the full clinical workflow from lesion identification through pathological classification.

TL;DR: The proposed system outperformed all 8 compared methods in both detection (98.5% vs. next-best 94.78%) and classification (99.16% vs. next-best 98.96%). Its key advantage is integrating detection, segmentation, and classification into a single pipeline, whereas most prior studies tackled only one task in isolation.
Pages 21-22
Clinical Implications and Paths Forward for 3D and Multi-Abnormality Extensions

Summary of achievements: The integrated deep learning CAD system demonstrated strong performance across all three diagnostic phases. The fused YOLO model achieved 98.5% detection accuracy for mass lesions, with particularly strong results for architectural distortion (95% sensitivity for cancer patients, 93.09% for non-malignant cases). The Associated-ResUNets architecture achieved the best segmentation performance among all tested UNet variants, with 95.89% Dice and 92.28% IoU. The BreastNet-SVM classifier reached 99.16% overall accuracy with the lowest misclassification rate (0.84%) among comparable studies, using the SGD optimizer with 48x48 input patches.

Clinical significance: The system's high sensitivity (97.13%) means that fewer malignant cases would go undetected in a clinical screening setting, potentially saving lives through earlier treatment initiation. The high specificity (99.30%) means fewer benign cases would be incorrectly flagged as suspicious, reducing unnecessary biopsies, patient anxiety, and healthcare costs. The end-to-end pipeline from detection through classification mirrors the actual clinical diagnostic workflow, making it more practically applicable than single-task models that address only one component of the diagnostic process.

Limitations: Several constraints should be noted. The study relies exclusively on the CBIS-DDSM dataset, which, while being a well-established benchmark, represents mammographic data from a specific population and imaging protocol. Calcification detection remained the most challenging task at 93.4% accuracy, reflecting the inherent difficulty of identifying these small, irregularly shaped lesions in automated systems. The segmentation phase focused only on mass lesions because calcification lesions lacked precise reference annotations, leaving a gap in the pipeline. Class imbalance in the original dataset, though addressed through augmentation, may still influence model generalization to real-world clinical data with different prevalence rates.

Future research directions: The authors propose expanding the framework to incorporate more types of breast abnormalities and to process 3D medical images such as CT scans and MRIs. Extending beyond 2D mammography to 3D imaging modalities could address some of the current limitations in detecting architectural distortion and small calcifications that are difficult to characterize in two dimensions. Multi-institution validation with diverse patient populations would strengthen confidence in the system's generalizability before clinical deployment.

TL;DR: The complete pipeline delivers 98.5% detection, 95.89% segmentation Dice, and 99.16% classification accuracy on CBIS-DDSM. Key limitations include reliance on a single dataset and the inability to segment calcifications. Future work targets 3D imaging (CT/MRI) and broader abnormality coverage to move the system closer to clinical deployment.
Citation: Ahmad J, Akram S, Jaffar A, et al.. Open Access, 2024. Available at: PMC11239011. DOI: 10.1371/journal.pone.0304757. License: cc by.