Breast cancer accounts for approximately 30% of all cancers diagnosed in women worldwide and remains one of the leading causes of cancer-related death. Traditional pathological examination, where a trained pathologist visually analyzes tissue slides under a microscope, is still the gold standard for breast cancer diagnosis. However, this manual process is time-consuming, requires significant expertise, and suffers from inter-observer variability, meaning different pathologists may reach different conclusions on the same tissue sample, particularly for challenging cases such as lobular neoplasia.
Whole slide imaging (WSI) technology has enabled the conversion of physical histopathological slides into high-resolution digital images. These digitized slides offer advantages over traditional microscopy: higher resolution, easier storage, and the ability to transmit images across institutions. The emergence of multiple digital pathology datasets has opened the door for artificial intelligence (AI) to assist in interpreting these slides, mining features from conventional H&E-stained (hematoxylin and eosin) images that were previously impossible for human observers to consistently capture.
Machine learning (ML) and its subset deep learning (DL) are the AI techniques most commonly applied in this field. ML algorithms learn patterns from input data and make predictions without explicit human instruction, using either supervised learning (labeled training data) or unsupervised learning (extracting patterns from unlabeled data). Deep learning, built on multi-layer neural networks, has become the mainstream approach for digital pathology, excelling at both low-level tasks like object detection and segmentation and high-level tasks like predicting disease diagnosis and treatment response from pathological images.
This review systematically covers AI applications across multiple domains of breast cancer digital pathology: identifying tumor components, classifying subtypes, grading histology, evaluating classical biomarkers (HER2, ER, PR, Ki-67), recognizing immune cells like tumor-infiltrating lymphocytes (TILs), and predicting disease outcomes. The authors followed the TITAN Guideline 2025 to ensure transparency in their reporting.
Tumor-infiltrating lymphocytes (TILs) are critical prognostic and predictive biomarkers in triple-negative breast cancer (TNBC) and HER2-positive breast cancer. Standardized visual assessment on H&E-stained slides measures the percentage of infiltrating lymphocytes in defined stromal or tumor areas. However, manual TIL scoring is semi-quantitative and suffers from inter-observer variability, limitations in capturing the spatial distribution of immune cells, and difficulty handling intra-tumoral heterogeneity.
Deep neural network approaches: Peng Sun et al. established a computational TIL assessment (CTA) method using three deep neural networks for nuclear segmentation, nuclear classification, and necrosis classification. Their automatic TIL (aTIL) score was associated with disease-free survival (HR = 0.96) in an Asian cohort (n = 184) and a Caucasian cohort (n = 117). Bai et al. used the open-source software QuPath with a CNN classifier across 920 TNBC patients in five independent cohorts, outputting five quantitative TIL variables with prognostic significance. Thagaard et al. reported a fully automated pipeline using the Visiopharm platform that identifies tissue types and automatically recognizes invasive tumors, stroma, and necrotic areas without manual interaction, reporting stromal TIL density as a quantitative variable in 262 patients.
Spatial distribution methods: Beyond simple percentage scoring, researchers have explored spatial TIL analysis. Han Le et al. generated combined maps of cancer regions and TILs in WSIs using a CNN analysis pipeline (VGG16, Inception-V4, and ResNet-34), providing insights into structural patterns and spatial distribution of lymphocyte infiltration across patches from 23 TCGA cancer types. An AI-powered WSI analyzer (Lunit SCOPE IO) performed spatial TIL analysis in 954 patients treated with neoadjuvant chemotherapy (NAC), finding that the iTIL score was an independent predictor of pathological complete response (pCR) with an adjusted odds ratio of 1.211 per 1-point change (P < 0.001).
Remaining challenges: The most frequent cause of inconsistent AI-based TIL results compared to manual scoring is the incorrect assessment of areas such as necrotic regions. Other issues include slide-related technical factors, heterogeneity in TIL distribution, and the need for standardized labeling strategies. Public frameworks for collaborative labeling that can identify all histological components, including DCIS, fibrosis, hyalinization, and granulocytes, are still needed.
HER2 evaluation: Up to 20% of breast cancers exhibit HER2 protein overexpression or gene amplification, making precise HER2 assessment crucial for targeted therapies including antibody-drug conjugates (ADCs). AI models have achieved accuracy comparable to human pathologists on IHC-stained slides. Her2Net, a deep learning framework, achieved 96.64% precision, 98.33% accuracy, and 96.71% F-score on 79 monoclonal antibody-stained WSIs. A deep reinforcement learning (DRL) model learned to identify diagnosis-related regions of interest by following a parameterized strategy, outperforming state-of-the-art deep convolutional networks on 172 slides from the HER2 scoring contest dataset. A fully automated AI solution demonstrated 92.1% agreement on slides with high-confidence ground truth across 120 patients from four pathology laboratories in three countries.
Predicting HER2 from H&E slides: Several studies have developed models to predict HER2 status directly from H&E-stained images without requiring IHC. The HEROHE challenge, the first competition to predict HER2 status from H&E-stained WSIs, featured models from 21 teams worldwide using architectures like ResNet34, DenseNet201, and EfficientNetB0 on 359 WSIs from 22 laboratories, with some models achieving AUC scores above 0.8. A CNN with EfficientNet B0 using transfer learning predicted HER2 gene amplification in equivocal (2+) IHC slides with an AUC of 0.79 and 81% overall accuracy (sensitivity = 0.50, specificity = 0.82). The ASCO/CAP guidelines have now recognized AI algorithms as diagnostic tools for evaluating HER2 IHC scores.
ER, PR, and multi-biomarker models: AI can predict ER/PR expression from pathology images using approaches like DeepLabv3+ with ResNet-34 and ResNet-101 for feature extraction, significantly increasing consistency among pathologists. The Morphological-Based Molecular Analysis (MBMP) method found that tissue morphology was significantly associated with molecular expression of all 19 biomarkers tested, including ER, PR, and HER2. A multiple instance deep-learning neural network predicted ER, PR, and HER2 status from H&E images across 2,535 WSIs from the ABCTB dataset and 1,014 WSIs from TCGA, achieving AUCs of 0.92 for ER, 0.81 for PR, and 0.78 for HER2.
Ki-67 quantification: Ki-67 is a reliable cell proliferation marker where elevated expression correlates with poorer prognosis. piNET, a deep learning-based proliferation index calculator, achieved a PI accuracy rate of 86% across patches, tissue microarrays (TMAs), and WSIs. UV-Net, an optimized U-Net structure focused on preserving high-resolution details, achieved a higher and more consistent average F1-score of 0.83 on multi-center datasets compared to other models.
BRCA mutation detection: Patients carrying pathogenic BRCA1/2 mutations face significantly increased breast cancer risk and may benefit from targeted therapies such as PARP inhibitors. Currently, detecting BRCA mutations requires genetic testing, which is time-consuming and expensive. AI-based approaches are exploring whether mutation status can be predicted directly from H&E-stained WSIs. One study trained a deep CNN based on ResNet on pathologist-annotated WSIs to predict gBRCA mutations, but was limited by a small training set of only 17 mutants and 47 wilds. A Bi-directional Self-Attention Multiple Instance Learning (BiAMIL) algorithm achieved an AUC of 0.819 for predicting BRCA status from H&E images, though no external validation was performed.
Limitations of current BRCA models: Another study applied a weakly supervised deep learning method to predict BRCA mutation status in H&E slides of epithelial ovarian cancer but achieved only a validation set ROC AUC of 0.59, highlighting the difficulty of this task. The authors conclude that AI-based BRCA mutation detection from WSIs remains in its infancy and requires substantially more research, larger datasets, and proper external validation before clinical applicability.
PD-1/PD-L1 for immunotherapy: PD-L1 targeted immunotherapy is among the most promising treatments for TNBC, with the FDA approving Ventana SP142-staining IHC to select immunotherapy candidates. However, manual PD-L1 scoring lacks standardization and suffers from variability. Gil Shamai et al. used deep learning to predict PD-L1 expression from H&E-stained slides with consistent predictive performance across two externally validated cohorts, though the study was limited to tissue microarray images rather than whole slide images. Another study using the open-source platform QuPath for PD-L1 evaluation found results highly consistent with manual pathologist scoring, with the ability to dynamically identify sensitivity thresholds for accurate assessment at low expression levels.
Despite promising results, only a few AI algorithms for PD-L1 assessment have been validated in clinical practice. The complexity of PD-L1 testing, which involves multiple antibody clones and distinct scoring systems, remains a significant barrier to standardized AI deployment.
Why new biomarkers matter: The tumor microenvironment (TME) in breast cancer involves complex interactions among cancer-associated fibroblasts, TILs, stromal matrix, and other cellular components. Many potential histological features in pathology images cannot be consistently captured by human pathologists. AI enables the creation of novel histo-biomarkers that comprehensively assess histological characteristics beyond what traditional methods like the Nottingham histological grade can capture.
Histomic Prognostic Signature (HiPS): Mohamed et al. designed HiPS, which integrates H&E-stained slides with ER, PR, and HER2 IHC panels, aligned with American Joint Committee on Cancer staging. HiPS uses deep learning to accurately map cellular and tissue structures, measuring epithelial, stromal, immune, and spatial interaction characteristics. The signature consistently outperformed pathologists in predicting survival outcomes in a large prospective cohort from CPS-II and three independent cohorts, independent of tumor-node-metastasis stage (HR: 0.81, 95% CI: 0.72 to 0.92, P = 0.001). A limitation was that each tissue region was described by a set of morphological and spatial features, which may miss small foci of angioinvasion.
Digital spatial TME scores: Another study developed AI-based features for the digital spatial tumor microenvironment (sTME) in TNBC using two multi-centric cohorts: AUBC (n = 318) and TCGA (n = 111). Standard H&E images were segmented into tumor, stroma, and lymphocyte regions, and quantitative features of their spatial relationships were calculated. The resulting digital stromal TIL score (Digi-sTILs) and digital tumor-associated stroma score (Digi-TAS) achieved C-index values of 0.65 (P = 0.0189) and 0.60 (P = 0.0437) respectively for predicting breast cancer-specific survival. This was the first study to demonstrate the importance of stromal features in TNBC survival outcomes using fully automated deep learning quantification.
Immune cell heterogeneity: Beyond TILs, AI has been applied to identify tumor-associated macrophages (TAMs) using self-supervised contrastive learning and weakly supervised learning, accurately predicting macrophage infiltration levels without manual annotation. Digital image analysis of CD68 and M2-like macrophage markers (CD163, CSF-1R, CD206) across tissue microarrays showed that macrophages were significantly associated with unfavorable clinicopathological characteristics in Luminal B breast cancer. CD8+ T cell spatial heterogeneity was also quantified, revealing that high-density T cell clusters correlated with response to PD-1 blockade therapy, though this finding was limited to a small cohort (n = 29).
The multi-omics advantage: Breast cancer is characterized by extensive genotypic and phenotypic heterogeneity. "Omics" technologies, including genomics, transcriptomics, proteomics, and metabolomics, capture tumor characteristics at multiple molecular levels. AI algorithms can process these high-dimensional, complex datasets through deep learning and machine learning to discover prognostic biomarkers, identify predictive markers for treatment response, and construct individualized survival prediction models integrating information from genomic to proteomic data.
Multi-omic therapy response prediction: Sammut et al. collected clinical, digital pathology, genomic, and transcriptomic data from pre-treatment biopsies and associated these with pathology endpoints (complete response vs. residual disease) at surgery. Their ensemble ML predictive model achieved an AUC of 0.87 for predicting pathological complete response (pCR) in an external validation cohort, demonstrating that models combining clinical, molecular, and digital pathology data significantly outperform those based on clinical variables alone. In another study, multiple ML models were compared for predicting NAC response: the random forest classifier performed best with an AUC of 0.88, substantially outperforming multiple logistic regression (AUC = 0.64).
Deep learning for treatment prediction: A hierarchical self-attention-guided deep learning framework predicted NAC response using digital histopathology images from pre-treatment biopsy samples, achieving an AUC of 0.89 and F1-score of 90% for predicting pCR in 207 patients. For metastasis detection, the Camelyon16 challenge demonstrated that AI models achieved AUCs of 99% and 99.6% on two test sets (399 slides from the challenge and 108 slides from a separate dataset). A deep learning-based grading model (DeepGrade) provided independent prognostic information with a hazard ratio of 2.94 across 1,567 patients from four different studies.
Multimodal data fusion: Raktim et al. constructed BioFusionNet, a deep learning framework combining image-derived features with genetic and clinical data for survival risk stratification of ER+ breast cancer patients. This model predicted risk (high vs. low) with prognostic significance for overall survival in both univariate and multivariate analyses. The combination of radiology and digital histopathology tissue slides has also been explored to extract disease-specific information that is difficult for investigators to quantify manually.
Early computer-aided diagnosis (CAD): Over the past two decades, efficient digitization of WSIs has driven AI adoption in digital pathology. Early approaches included CAD schemes that automatically detected and graded lymphocytic infiltration in digitized HER2+ breast cancer histopathology. These CAD algorithms were further developed for disease detection, diagnosis, and prognosis prediction to complement pathologist opinions. Traditional machine learning then expanded into supervised methods for image segmentation and disease classification, and unsupervised methods for tasks like benign/malignant tumor detection and mass detection in breast cancer.
Deep learning architectures: As data volumes and problem complexity grew, deep learning became the mainstream choice. Convolutional neural networks (CNNs) are the most extensively used, transforming input images through convolutional, pooling, and fully connected layers for detection, segmentation, and classification. Fully convolutional networks (FCNs) replace fully connected layers with convolutional layers, enabling pixel-level predictions. U-Net, based on FCN architecture, has become the most well-known medical image segmentation model, with variants like V-Net for volumetric segmentation. Mask R-CNN has achieved strong results in lesion detection, followed by ResNet50 for malignancy probability estimation. Recurrent neural networks (RNNs) process sequential data and can generate patient-level predictions from patch-level features. GANs generate synthetic data for feature segmentation and stain transfer applications.
Transformers and hybrid models: The transformer architecture uses self-attention mechanisms to process input data and has shown unique advantages in gene sequence feature extraction and predicting interactions between histological features. When combined with CNNs, transformers demonstrate better performance in medical image segmentation for both 2D and 3D scenarios. TransUNet and TransFuse represent notable hybrid approaches. Natural language processing (NLP) models based on transformers can integrate real-world data such as patient clinical information, lifestyle habits, and treatment history to significantly improve cancer prognosis prediction accuracy.
Large language models (LLMs): Models such as GPT-4, LLaMA, DeepSeek, and Grok 3 are being applied in healthcare. The GPT-4V framework uses in-context learning to classify medical images with a small number of samples without complex model fine-tuning, which is particularly valuable in data-scarce situations. LLMs have accelerated the translation from pathology reports to clinical decisions by parsing millions of medical documents and constructing pathology knowledge graphs. These emerging technologies offer new possibilities for the future of digital pathology.
Clinical trial evidence: Several clinical trials have demonstrated AI's potential in clinical settings. The CONFIDENT-B trial, a non-randomized single-center study, evaluated an AI-assisted workflow to detect breast cancer metastases in sentinel lymph nodes across 190 consecutively enrolled specimens. Pathologists using AI assistance achieved a significant reduction in the adjusted relative risk of IHC use, along with meaningful reductions in time and cost. A separate multi-center clinical trial used a stacking model to predict axillary lymph node response to NAC using longitudinal MRI in 1,153 patients with node-positive breast cancer. This model showed lower false negative rates compared to radiologists alone.
Practical benefits of AI: AI can assist doctors in improving diagnostic accuracy and consistency while reducing human bias. Automated AI systems can significantly shorten diagnosis time and help alleviate the global shortage of pathologists, which is especially impactful in resource-limited areas. Predictive models integrating multimodal data (genomic, transcriptomic, pathological images, radiological images) can predict patient response to specific therapies, support risk stratification, and guide personalized treatment. AI's data analysis capabilities can mine information from complex datasets, identify biomarkers that are difficult to find through traditional methods, discover new drug targets, and support high-quality clinical research.
Regulatory progress: The ASCO/CAP guidelines have recognized AI algorithms as diagnostic tools for evaluating HER2 IHC scores, and the College of American Pathologists (CAP) has issued guidelines to promote AI application in clinical practice. These regulatory milestones represent important steps toward broader clinical adoption, though most AI research in digital pathology remains in laboratory settings, and most clinical trials have been conducted in radiology rather than pathology.
Interpretability gap: Many ML algorithms are considered "black box" models that lack the interpretability needed for clinical trust. Explainable AI (XAI) techniques show promise in bridging this gap. SHAP (SHapley Additive exPlanations) quantifies the contribution of each feature to model predictions, avoiding the opacity of traditional models. Grad-CAM (gradient-weighted class activation mapping) visualizes the areas a model focuses on when making predictions through heatmaps, showing clinicians the basis for each decision. These approaches offer possible solutions to the interpretability barrier, though further development is needed for widespread clinical deployment.
Generalizability concerns: Most AI studies are trained on small datasets, leading to performance degradation when applied to large independent datasets or real clinical practice. Data privacy concerns limit medical data sharing for model training. A significant challenge is the inherent variability of medical image data stemming from differences in staining methods, scanner resolution, contrast, and signal-to-noise ratios across clinical protocols. Standardizing medical images to ensure uniformity remains a major obstacle.
Federated learning as a solution: Federated learning allows models to train collaboratively on multi-institutional data without exchanging the data itself among participating institutions. This approach addresses critical data privacy issues while maintaining model performance and enabling distributed training across datasets. Although federated learning has its own challenges, it facilitates learning more generalizable and better-performing algorithms that can meet the needs of diverse clinical applications. This is particularly relevant for breast cancer pathology, where training data must represent the full range of tumor subtypes, staining protocols, and imaging equipment used across different healthcare systems.
Remaining barriers: The accuracy of evaluating biomarkers such as TILs, HER2, and ER from H&E-stained images still requires further validation. There is a lack of unified reference standards and regulatory frameworks for AI in pathology. Most clinical studies remain limited to retrospective analyses with insufficient prospective clinical validation. The lack of digitalization of pathology workflows in many institutions and the limited interpretability of AI models may reduce patient trust in AI-assisted diagnosis. Despite these challenges, the authors conclude that the rapid evolution of AI will profoundly promote research in digital pathology, bringing a new era of precision medicine for breast cancer.