Breast cancer has surpassed lung and thyroid cancers to become the most prevalent malignancy worldwide, according to GLOBOCAN 2022. The statistics are stark: China leads with 15.6% of global incidence and 14.8% of mortality, followed by the USA (12% incidence, 11.3% mortality) and India (8.4% incidence, 6.4% mortality). Projections suggest the global burden will reach 3 million new cases and 1 million deaths annually by 2040. In developing nations, growth rates are alarming: Pakistan faces a projected 99.1% increase, the Philippines 81.4%, Indonesia 52.1%, and India 80.2% by 2045. These numbers reveal a crisis that is disproportionately concentrated in Asia.
The breast density challenge: In Asian populations, breast cancer presents roughly a decade earlier than in Western countries, with a median age of 50 in most Asian nations compared to 60-70 in the West. This earlier onset is compounded by characteristically higher breast density in Asian women. Dense breast tissue not only increases cancer risk but also obscures lesions on mammograms, reducing image clarity. The result is a double burden: higher rates of false negatives (missed cancers) and false positives (unnecessary biopsies). While 60-70% of US cases are diagnosed at stage 1, only 1-8% of Indian cases are caught that early, largely due to screening limitations in dense breast tissue combined with cultural, socioeconomic, and infrastructural barriers.
The promise and limits of AI: Deep learning (DL) and Convolutional Neural Networks (CNNs) have shown considerable promise in enhancing mammographic sensitivity, reducing human error, and improving diagnostic accuracy. However, the vast majority of these AI models have been trained on Caucasian datasets, creating significant limitations when applied to Asian populations. The authors of this systematic literature review (SLR) set out to map the global landscape of DL-based computer-aided detection (CAD) systems for breast cancer, with a particular focus on identifying the gaps and challenges specific to Asian contexts.
Scope of the review: This SLR was registered in PROSPERO (CRD42023478896) and follows PRISMA guidelines. The authors searched Scopus and Web of Science for studies published between January 2018 and November 2023, with a supplementary hand search covering 2024-2025. After screening 1,051 records, 287 articles met all inclusion criteria for qualitative synthesis. The review addresses five research questions spanning current DL methodologies, challenges specific to Asian populations, commonly used mammogram datasets, the impact of preprocessing and augmentation techniques, and key drawbacks with future directions.
Search strategy: The authors conducted a systematic search of two major databases, Scopus and Web of Science, using custom queries centered on "Breast Cancer," "mammography," and "Deep Learning" techniques for detection, segmentation, classification, or identification. Scopus returned 584 papers and Web of Science returned 467, totaling 1,051 records. Importantly, the term "Asian" was not included in search strings because studies typically reference specific dataset names or hospital names rather than demographic terms. This broader approach ensured comprehensive coverage while allowing region-specific analysis during data extraction.
Inclusion and exclusion criteria: Studies had to be original research articles published between January 2018 and November 2023, involving women over 18, using mammography as the imaging modality with DL models for detection, segmentation, feature extraction, or classification. Articles needed to report empirical performance metrics using standard cranio-caudal (CC) and mediolateral oblique (MLO) mammogram views. Exclusions applied to studies using other imaging modalities (MRI, CT, ultrasound, tomosynthesis), conference papers, abstracts, reviews, clinical trials, case studies, grey literature, non-English publications, and studies that failed to report relevant diagnostic measures.
Screening and quality assessment: After removing 363 duplicates from the 1,051 records, 688 articles were screened. Of these, 70 were excluded (67 reviews, 2 withdrawn, 1 retracted), and 16 could not be retrieved. From the remaining 602 reports, 198 were excluded as multimodality studies and 117 failed abstract screening. Ultimately, 287 articles met all criteria. Quality assessment was conducted using JabRef software, with each article evaluated for methodology, relevance to research objectives, and evidence reliability. Studies performing only statistical analysis (FROC) without implementing DL models were removed, as were those using non-standard mammographic views.
Supplementary hand search: To capture developments after the formal search cutoff, supplementary hand searching was conducted from January 2024 to June 2025, monitoring high-impact journals. This targeted search identified 25 additional relevant articles revealing emerging trends in transformer architectures, multi-view analysis, multi-modal adaptations, and new datasets. While these hand-search results were not included in the formal data synthesis to maintain methodological rigor, they confirmed the continued relevance of the review's findings and informed the discussion of future directions.
Exponential growth: The temporal analysis reveals a dramatic six-fold increase in publications, from just 15 articles in 2018 to 86 in 2023. This acceleration, particularly pronounced after 2020, coincides with advances in deep learning architectures and increased computational accessibility. Springer and Elsevier emerged as the dominant traditional publishers, while MDPI's rapid growth after 2021 reflects the field's embrace of open-access publishing. This shift toward open access is particularly important for researchers in developing countries who are expanding mammography infrastructure and need unrestricted access to the latest methods.
Collaboration gaps: Co-authorship analysis using VOSViewer reveals fragmented collaboration patterns with distinct clusters, suggesting missed opportunities for knowledge transfer and methodological standardization. Separate visualization of Asian collaboration networks highlights regional tendencies but underscores the need for global collaborative frameworks. Country-wise analysis shows research concentration in Asia, North America, and Europe, but high-publication countries with lower citation impact suggest emerging research capabilities that would benefit from international partnership and support.
The dataset disparity: The most commonly used datasets are InBreast, CBIS-DDSM, and DDSM, all of which are Caucasian-population datasets. The research focus maps created by the authors reveal a staggering imbalance: there is an 8-fold difference in detection studies (72 Caucasian vs. 9 Asian) and a 4-fold difference in segmentation studies (33 Caucasian vs. 8 Asian). For breast density research, Asian populations have only 5 studies compared to 24 for Caucasian populations, a 5-fold gap. African and Oceanian populations have zero dedicated breast density studies. Over 80% of all reviewed studies focus on Caucasian datasets.
BI-RADS classification gaps: Research heavily skews toward binary categorization. About 84% of Caucasian and 55% of Asian lesion classification studies focus on simple benign-versus-malignant distinctions. Critically, very few studies attempt comprehensive BI-RADS categorization covering all classes (only 2 studies for MC8, the full 8-class problem). Higher-complexity classifications (MC5 through MC7) remain underexplored with fewer than 6 studies each. This represents a fundamental mismatch with clinical needs, where radiologists must distinguish between all BI-RADS categories to guide patient management decisions.
Detection techniques: Breast lesion detection relies on DL models such as YOLO variants, Faster R-CNN, and newer transformer-based approaches. However, only 20% of detection studies focus on Asian datasets. Methods applied to Asian data include Extreme Learning Machines, Active Learning, Mask-RCNN, Faster RCNN, patch-wise CNN models, RetinaNet, Graph Neural Networks, Reciprocal Learning, and YOLO variants. The underrepresentation of Asian data stems from limited availability of large-scale annotated datasets, resource constraints, regulatory barriers for cross-institutional data sharing, and varying imaging protocols across different healthcare systems.
Segmentation approaches: Segmentation methods have progressed from thresholding and morphological operations to U-Net-based models, with recent advancements incorporating attention mechanisms and hierarchical techniques. The architectural evolution shows a shift from traditional edge-detection and region-growing methods to encoder-decoder architectures. Despite this progress, only 10% of segmentation studies use Asian datasets, primarily because pixel-level annotation is labor-intensive, requiring expert radiologist involvement. Methods employed on Asian data include Region Growing, ResU-SegNet, Connected SegNets, Otsu thresholding, Attention-based Active Learning, Hierarchical Gaussian Mixture Models, Frangi Filters for vessel segmentation, and Coarse-to-Fine Transformers. Breast density segmentation remains largely unexplored, with zero published work on Asian datasets.
Classification landscape: Lesion classification is the most studied task, categorizing findings into BI-RADS classes using DL-based feature extraction and end-to-end models. Architectural trends reveal a progression from traditional CNN-based feature extractors (VGG, ResNet) to attention mechanisms, ensemble methods, and transformer-based architectures. Yet only 13% of classification studies involve Asian datasets. Techniques applied to Asian data span a wide range: DenseNet, EfficientNet, Weight-sharing MobileNet, ConvNet with SVM, Deep Adversarial Domain Adaptation, Multi-Scale CNN, ResNet variants, squeeze-and-excitation networks, Convolutional Autoencoders, LSTM with Vanilla Siamese Networks, Graph Convolutional Networks, GANs, Ensemble Self-Attention Transformers, and YOLO with Adaptive Multiscale Decision Fusion.
Breast density classification: Mammographic breast density is a vital biomarker for treatment planning and cancer risk prediction. Customized CNNs are the most common approach for density classification, with 22% of density studies using Asian data. Techniques applied include Graph CNNs, MobileNet, Multi-View Attention-guided Residual Learning, and DenseNet. The relatively higher representation of Asian data in density studies (22% vs. 13% for lesion classification) suggests greater accessibility of density annotations, though standardization across different density assessment protocols remains a challenge.
Dominant datasets: The review found that InBreast, CBIS-DDSM, and DDSM are the most frequently used mammogram datasets across all studies. These are all Caucasian-population datasets. Despite China and India having the highest breast cancer incidence and mortality in Asia, only two publicly available Asian datasets exist: CMMD (Chinese Mammography Database) and VinDr-Mammo (Vietnamese dataset). CMMD lacks annotations for mammogram regions of interest and contains no information on breast density or BI-RADS categories, limiting its utility for lesion detection research. VinDr-Mammo offers density and BI-RADS assessment data but lacks molecular, histological, and pathology confirmation, relying solely on radiologist expertise.
Root causes of scarcity: The limited availability of Asian datasets has multiple interconnected causes. Lack of funding and inadequate infrastructure hinder the establishment of screening programs needed to generate data. Healthcare disparities across Asian nations mean that screening participation rates vary widely. Cultural beliefs and financial circumstances also influence adoption of mammographic screening. Regulatory barriers further complicate cross-institutional data sharing within and between countries. The labor-intensive nature of annotation, particularly for detection and segmentation tasks requiring pixel-level delineation, compounds the problem in resource-constrained settings.
Clinical consequences of the gap: Models trained predominantly on Caucasian data may not perform accurately on Asian populations due to ethnic differences in breast tissue density, anatomical structure, and imaging characteristics. Asian women typically have denser breast tissue, which can mask abnormalities on mammograms, delaying diagnosis and treatment. Up to 25% of breast cancer patients in Asia are young (under 50), and younger patients typically have even denser breast tissue and poorer prognosis. Without sufficient data diversity, DL models fail to account for health conditions and visual patterns unique to Asian populations, leading to biased predictions that could directly harm patients.
Newly identified datasets: During their hand search, the authors identified several recently published datasets: KAUH-BCMD, DMID, LAMIS-DMDB, and MEXBreast. Notably, DMID and KAUH-BCMD represent Asian datasets, contributing to regional diversity. However, limitations persist: KAUH-BCMD has only 2 lesion classes, restricting multi-class research, while DMID, despite offering multiple BI-RADS categories for both lesion and density classification, has a relatively small size that may limit model generalization. These new datasets are steps in the right direction but far from sufficient to close the gap.
Preprocessing techniques: Digital mammography images require several preprocessing steps before being fed into DL models. The review categorizes these into noise reduction, image enhancement, artifact removal, and region-of-interest selection. The adaptive median filter is the most commonly used noise reduction technique, while contrast-limited adaptive histogram equalization (CLAHE) is the most frequently applied enhancement method. Thresholding and morphological operations handle artifact removal and ROI selection. Full-field digital mammogram (FFDM) images, due to their high resolution, typically require minimal preprocessing beyond resizing and normalization, though these steps remain important for standardizing inputs across different imaging equipment.
Traditional augmentation: Data augmentation in medical imaging employs geometric transformations such as rotation, flipping, and scaling to artificially expand training datasets. Online data augmentation and basic transformations remain the most common approaches. These techniques are especially important for breast cancer datasets, where class imbalance (more benign cases than malignant) can cause models to skew toward predicting the healthy class and converge slowly. Methods to address class imbalance include class weighting, oversampling, undersampling, and synthetic lesion creation to increase malignant sample proportions.
GAN-based augmentation: Generative Adversarial Networks (GANs) are increasingly used in recent studies for synthetic data generation, particularly for dense breast tissue and rare pathological conditions. However, their clinical application faces significant challenges: anatomical implausibility (generating images that look realistic but contain impossible structures), bias amplification (reinforcing existing dataset biases), and the potential generation of non-existent pathologies. Recent advances in diffusion models offer improved stability and mode coverage compared to traditional GANs, while physics-informed architectures incorporate domain knowledge to enhance anatomical realism.
Validation requirements: Rigorous clinical validation of synthetic data requires a multistage assessment involving quantitative metrics (Frechet Inception Distance and Structural Similarity Index Measure), expert radiologist evaluation, and downstream task performance validation. The authors propose actionable solutions including crowdsourced annotation platforms for underrepresented populations, semi-supervised learning approaches to leverage unlabeled diverse datasets, and federated learning frameworks that enable collaborative model training across geographically distributed medical centers while preserving patient privacy.
Detection breakthroughs: Recent detection advances include YOLO-v8 with its anchor-free mechanism and decoupled head architecture for microcalcification detection, GravityNet using gravity points as pixel-based anchors for small lesion detection, and Pro UNeXt incorporating micro-calcification learning blocks with fused-MBConv modules. Hybrid Vision Transformers (ViT++) combine contextual and visual features from both CNNs and transformers. For mass detection, RCM-YOLO uses Residual Asymmetric Dilated convolution modules, while SelfAdaptNet combines self-supervised learning with adversarial training to address domain-shift challenges across different imaging centers.
Multi-view and density paradigms: Multi-view analysis has emerged as a dominant paradigm. BTMuda (Bi-level Multi-source Unsupervised Domain Adaptation) addresses both intra-domain and inter-domain variations through Three-Branch Mixed extractors combining CNNs and Transformers. For breast density assessment, MV-DEFEAT employs Dempster-Shafer evidential theory to combine multi-view evidence with calibrated uncertainty. LCVT-GR models use parallel global-local analysis through Local Cross-View Transformers. Progressive Transfer Ensemble Learning stacks multiple CNN architectures (VGG16, ResNet, EfficientNet, DenseNet, Xception) across multi-step diagnostic processes, achieving refined classification results.
Segmentation innovations: For mass segmentation, novel architectures include Att-U-Node for automated breast tumor segmentation, SRMADNet employing Swin ResUnet3+ for comprehensive mammogram segmentation, and hybrid YOLOv5-MedSAM frameworks combining object detection with specialized medical segmentation. For microcalcification segmentation, Pro UNeXt enhances UNeXt architecture with multiple-loss-function training combining focal loss, Dice loss, and Hausdorff distance loss. Breast density segmentation has been advanced through U-Net-based architectures enabling multi-class semantic segmentation of anatomical structures including nipple, pectoral muscle, fibroglandular tissue, and fatty tissue.
Classification and contrastive learning: The Attention-based Hybrid View Learning (AHVL) framework incorporates Contrastive Switch Attention modules integrating pre-trained CLIP language models for category embeddings as anchor points. Domain-Invariant Features Learning Framework (DIFLF) employs contrastive learning through Style-Augmentation Modules and Content-Style Disentanglement Modules for single-source domain generalization. BRAIxDet addresses incomplete annotations through two-stage semi-supervised learning combining multi-view mammogram classifiers with student-teacher frameworks using pseudo-labeling. These techniques collectively represent a shift toward more robust, generalizable models that can handle real-world variability in imaging conditions and patient populations.
Data limitations to address: The most pressing challenge is the scarcity of diverse, comprehensive Asian datasets. Currently only two publicly available datasets from Vietnamese and Chinese female patients exist, and DL models require large training datasets for optimal results. Future research must focus on gathering larger, well-annotated datasets from various Asian populations. Class imbalance remains a systemic problem across datasets, requiring methods such as class weighting, over- and undersampling, and synthetic lesion creation. Cross-population model validation through domain adaptation techniques, progressive fine-tuning strategies, and cross-cultural validation frameworks is essential to ensure models trained on one population generalize to others.
Architecture improvements needed: Many DL models are pretrained on non-medical image datasets (like ImageNet) and lack morphological awareness for medical applications. The review calls for developing foundation models pretrained on large-scale medical imaging datasets and self-supervised learning approaches using mammography-specific pretext tasks. Multi-view integration is another critical gap: mammograms are typically taken in two views (CC and MLO), yet many studies treat them separately. Technical solutions include dual-stream CNNs with view-specific feature extraction, cross-view attention mechanisms, late fusion strategies with view-specific confidence weighting, and Siamese networks for learning view-invariant representations.
Clinical translation priorities: Comprehensive BI-RADS categorization beyond simple binary classification is urgently needed to match clinical workflow requirements. Physicians struggle to distinguish subtle differences between BI-RADS subcategories (such as B3, B4, and the B4 sub-classes), and almost no DL research addresses this complexity. Future risk assessment also remains largely unexplored, despite its importance for personalized care. The authors recommend longitudinal deep learning models analyzing temporal changes in breast tissue, multi-instance learning for identifying subtle risk indicators, and survival analysis networks for time-to-event prediction.
The path forward: The ultimate challenge is seamless CAD system integration into clinical workflows, requiring interdisciplinary research combining AI innovation, clinical validation, regulatory approval, and healthcare provider acceptance. The review concludes that most existing DL models are trained predominantly on Caucasian datasets, creating significant limitations for global applicability. To improve breast cancer screening worldwide, researchers must develop systems using diverse datasets representing different populations, validate these models across various ethnic groups, and ensure clinical testing includes women from multiple demographic backgrounds. The field has made remarkable progress in 6 years, but the gap between technical capability and equitable clinical deployment remains wide.