Endometrial cancer is one of the three most common malignant tumors of the female reproductive system, with a global incidence rate of approximately 8.7 per 100,000 women in 2022 and nearly 420,000 new cases diagnosed annually worldwide. The choice of surgical treatment and patient prognosis depend heavily on accurate preoperative staging, particularly on determining how deeply the tumor has invaded the uterine wall (myometrial invasion) and whether it has spread into the cervical stroma (cervical stromal invasion). Getting these assessments right before surgery is critical because they directly influence decisions about the extent of surgery, whether lymph node dissection is needed, and whether adjuvant chemotherapy or radiation will be recommended.
The role of MRI in staging: Magnetic resonance imaging is the standard tool for preoperative assessment because of its excellent soft-tissue contrast. MRI can distinguish between the uterine mucosa, submucosa, myometrium, and serosa without exposing the patient to ionizing radiation. However, MRI interpretation is subjective and depends heavily on the radiologist's experience. When cancer cells infiltrate less than 50% of the myometrium, the lymph node metastasis rate is low and the 5-year survival rate is high. When infiltration reaches 50% or more, lymph node metastasis rates increase significantly and survival drops. This binary threshold makes accurate depth assessment a pivotal clinical decision point.
The AI opportunity: In recent years, deep learning and machine learning models trained on MRI data have shown promising results in medical imaging across multiple cancer types, including gastric cancer endoscopy, colorectal cancer MRI, lung nodule CT, and breast cancer mammography. For endometrial cancer specifically, multiple AI-based MRI studies have been published, but no comprehensive meta-analysis had previously synthesized their results. This study by Zheng, Lin, and Li from Ningbo University fills that gap by systematically collecting all available evidence and calculating pooled diagnostic accuracy measures.
Study registration and methodology: The review was prospectively registered in the PROSPERO database (CRD420251059395) and followed PRISMA 2020 guidelines. The authors searched PubMed, Web of Science, Embase, and the Cochrane Library from inception through March 2025, using both MeSH terms and free-text keywords related to artificial intelligence, magnetic resonance imaging, and endometrial neoplasms.
Literature screening process: The systematic search retrieved 183 relevant publications across four major databases. After removing duplicates using EndNote X9, 122 articles remained. Title and abstract screening eliminated 106 publications, leaving 16 articles for full-text review. Of these 16, another 8 were excluded due to insufficient information or failure to meet inclusion criteria. The final analysis included 8 studies encompassing 13 separate cohorts. Two independent researchers (Zheng JJ and Lin XY) performed the screening, with a third researcher (Li M) serving as an arbitrator when disagreements could not be resolved through discussion.
Inclusion and exclusion criteria: The review included both prospective and retrospective studies investigating AI-enhanced MRI in preoperative staging of endometrial cancer. Key requirements were: patients diagnosed with endometrial cancer, AI-based MRI as the diagnostic test (with no restrictions on the specific model or methodology), histopathology as the gold standard, and reported outcomes including sensitivity, specificity, likelihood ratios, diagnostic odds ratio, and AUC values. Studies were excluded if they were duplicate publications, unrelated to the topic, review articles or conference abstracts, involved animal models, lacked sufficient data, or were not published in English.
Quality assessment with QUADAS-2: The methodological quality of all included studies was evaluated using the QUADAS-2 tool, which assesses risk of bias across four domains (patient selection, index test, reference standard, and flow and timing) and applicability concerns across three domains. In general, the publications analyzed were of high quality, with most exhibiting either a low or unclear risk of bias. Review Manager 5.4.1 was used to present the quality evaluation findings. Two reviewers independently assessed quality, resolving discrepancies through discussion or third-party consultation.
Geographic and design overview: The 8 included studies came from four countries: China (Chen 2020, Tao 2022, Wang 2025, Wang 2024), France (Lecointre 2025), a Canada-France collaboration (Lefebvre 2022, Lefebvre 2023), and Spain (Rodriguez-Ortega 2021). All studies except one (Tao 2022, which was prospective) used a retrospective design. Sample sizes ranged from 53 patients (Lefebvre 2023) to 567 patients (Wang 2024), with mean or median ages typically falling between 52 and 67 years. All studies used histopathological diagnosis as the gold standard reference.
Diverse AI architectures: The studies employed a wide variety of AI approaches, reflecting the field's exploratory stage. Chen (2020) used the YOLOv3 object detection algorithm on 138 patients, applying it to locate and classify lesions on T2-weighted MRI. Lecointre (2025) applied a deep learning model to 178 patients. Lefebvre (2022 and 2023) used radiomics-based approaches, including Pyradiomics 3.0 and spherical harmonics feature extraction. Rodriguez-Ortega (2021) employed four AdaBoost ensemble models on 143 patients. Tao (2022) used a ResNet neural network on 80 patients. Wang (2025) tested four different approaches on the same 182-patient cohort: radiomics, deep learning, stacking (a meta-learning ensemble technique), and a general ensemble model. Wang (2024) performed both internal and external testing on 567 patients.
Why multiple cohorts from one study matter: When a single study tested multiple AI models or used separate internal and external validation cohorts, each was extracted as a separate cohort and labeled with letters (e.g., Wang 2025a, 2025b, 2025c, 2025d). This approach is standard in diagnostic meta-analyses because each model-cohort combination provides an independent estimate of diagnostic accuracy. The 8 studies thus yielded 13 distinct cohorts for analysis, giving the meta-analysis more data points to work with when calculating pooled performance.
What is deep myometrial invasion? In endometrial cancer staging, the depth to which the tumor penetrates the muscular wall of the uterus (the myometrium) is one of the most important prognostic factors. The critical threshold is 50%: tumors that invade less than half the myometrium are classified as FIGO stage IA, while those that invade 50% or more are classified as stage IB. Stage IB carries a significantly higher risk of lymph node metastasis and poorer 5-year survival. Accurately determining this before surgery helps clinicians decide whether to perform lymph node dissection and plan the extent of the operation.
Pooled diagnostic performance: Six studies (seven cohorts) reported AI-based MRI performance for detecting deep myometrial invasion. Using a bivariate random-effects model, the meta-analysis calculated the following pooled values: sensitivity of 0.80 (95% CI: 0.75-0.85), specificity of 0.81 (95% CI: 0.64-0.91), positive likelihood ratio of 4.2 (95% CI: 2.0-8.5), negative likelihood ratio of 0.24 (95% CI: 0.17-0.34), and diagnostic odds ratio of 17 (95% CI: 6-47). The summary receiver operating characteristic (SROC) curve yielded an AUC of 0.83 (95% CI: 0.80-0.86).
Interpreting the numbers: A sensitivity of 0.80 means that AI correctly identified 80% of cases where deep myometrial invasion was truly present. A specificity of 0.81 means it correctly ruled out deep invasion in 81% of cases where it was truly absent. The positive likelihood ratio of 4.2 indicates that a positive AI prediction makes deep invasion about 4 times more likely. The negative likelihood ratio of 0.24 means a negative AI result reduces the probability of deep invasion to about one-quarter of the pre-test probability. An AUC of 0.83 falls in the "good" range for a diagnostic test.
Heterogeneity findings: While the sensitivity estimates were fairly consistent across studies (I-squared = 25.09%), there was considerable heterogeneity in specificity (I-squared = 88.27%, 95% CI: 81.01-95.52%). This wide variation in specificity likely reflects differences in AI model architectures, MRI scanning protocols, patient populations, and the way each study defined and measured the outcomes. This high heterogeneity in specificity means the pooled specificity estimate of 0.81 should be interpreted cautiously.
Why cervical stromal invasion matters: Cervical stromal invasion occurs when endometrial cancer extends beyond the uterine body into the cervical stroma (the dense connective tissue of the cervix). This finding upgrades the cancer from FIGO stage I to stage II and directly influences the choice of surgical approach, specifically whether a radical hysterectomy (rather than a simple hysterectomy) is needed. Traditional MRI interpretation of cervical invasion is particularly challenging because it relies on the radiologist's subjective identification of subtle interruptions in the low-signal cervical fibrostromal band, a task with notable interobserver variability.
Pooled diagnostic performance: Two studies (five cohorts) reported AI-based MRI performance for detecting cervical stromal invasion. The pooled results showed: sensitivity of 0.78 (95% CI: 0.55-0.91), specificity of 0.86 (95% CI: 0.79-0.91), positive likelihood ratio of 5.6 (95% CI: 4.3-7.4), negative likelihood ratio of 0.25 (95% CI: 0.12-0.55), and diagnostic odds ratio of 22 (95% CI: 11-44). The SROC curve produced an AUC of 0.90 (95% CI: 0.87-0.92), which is notably higher than the 0.83 AUC achieved for deep myometrial invasion.
Important caveats: The 95% confidence interval for sensitivity was remarkably wide (0.55-0.91), indicating significant uncertainty. This means the true sensitivity could be as low as 55% or as high as 91%. Additionally, substantial heterogeneity was observed in both sensitivity (I-squared = 92.73%) and specificity (I-squared = 77.89%). These high heterogeneity values reflect the fact that only two studies contributed data, and the AI models used in those studies differed substantially. The wide confidence intervals and high heterogeneity mean these results, while encouraging, require validation in larger studies.
Comparison with human radiologists: The authors highlight the 2025 study by Lecointre et al., which directly compared AI model performance against human radiologists. The AI system achieved comparable sensitivity to experienced radiologists in detecting cervical stromal invasion while exhibiting superior specificity. This suggests that AI may serve as an effective adjunctive tool to reduce overdiagnosis of cervical invasion and help avoid unnecessary extensive surgery. However, the authors caution that most current studies have limited sample sizes and remain in the exploratory stage.
What is publication bias? Publication bias occurs when studies with positive or exciting results are more likely to be published than those with negative or inconclusive findings. In meta-analyses, this can lead to overly optimistic pooled estimates because the evidence base is skewed toward favorable outcomes. Deeks' funnel plot asymmetry test is the standard method for detecting publication bias in diagnostic test accuracy meta-analyses. In this test, study results are plotted against a measure of study size, and asymmetry in the plot suggests that smaller studies with less favorable results may be missing from the literature.
Results for deep myometrial invasion: The Deeks' test produced a P value of 0.74 for the deep myometrial invasion analysis, well above the 0.05 threshold. This indicates no statistically significant evidence of publication bias, meaning the pooled sensitivity of 80%, specificity of 81%, and AUC of 0.83 are likely to be reasonably representative of the true diagnostic performance across published and potentially unpublished studies. This is reassuring for the clinical relevance of AI-based MRI for myometrial invasion assessment.
Results for cervical stromal invasion: In contrast, the Deeks' test for cervical stromal invasion yielded a P value of 0.02, below the 0.05 significance threshold. This indicates statistically significant publication bias, suggesting that the pooled AUC of 0.90 and the other cervical invasion metrics may be inflated because studies with less impressive results were either not conducted or not published. This finding adds another layer of caution to the already-uncertain cervical invasion results and reinforces the need for prospective validation studies.
Data-driven accuracy gains: The authors discuss several mechanisms through which AI enhances MRI interpretation for endometrial cancer. First, AI models trained on large MRI datasets can learn richer and more complex feature representations than what a single radiologist accumulates over a career. For example, Chen et al. developed a deep learning model using 530 MRI images that achieved 84.4% accuracy for predicting myometrial invasion depth, exceeding the 80.0% accuracy of general radiologists on the same task. This data-driven approach improves diagnostic accuracy and enables models to maintain stability across different types of MRI data.
Detecting subtle features: AI can automatically extract features and patterns from MRI images that are difficult or impossible for human observers to detect. Deep learning algorithms identify subtle differences in tissue texture, signal intensity, and spatial relationships that may distinguish between superficial and deep invasion. These algorithms continuously learn and evolve as more training data becomes available, progressively improving their discriminative ability. The capacity to discover patterns hidden in large datasets and apply them to new imaging data represents a fundamentally different diagnostic approach from visual interpretation.
Efficiency and automation: AI processes medical imaging data rapidly, completing complex analyses in seconds rather than the minutes required for careful radiologist review. This speed advantage becomes especially significant in busy clinical settings with high case volumes. Automated MRI analysis also reduces the errors and subjectivity inherent in manual interpretation, improving the consistency and repeatability of diagnoses across different readers, time points, and institutions. This consistency is particularly valuable for endometrial cancer staging, where interobserver variability among radiologists is a well-documented problem.
Clinical decision support potential: The authors emphasize that the ultimate value of AI-MRI lies not just in diagnostic accuracy but in its potential to improve patient management. By more accurately identifying early-stage or low-risk patients, AI could help clinicians avoid unnecessary extended surgery or lymph node dissection, reducing surgical complications and improving quality of life. However, this translational benefit has not yet been demonstrated in prospective studies comparing AI-assisted versus standard clinical pathways.
Retrospective design and geographic bias: The majority of included studies were retrospective, which introduces inherent and uncontrollable biases related to patient selection and data quality. Furthermore, most study populations came from Asia and Europe, raising questions about generalizability to other regions. Different patient demographics, MRI scanner models, scanning protocols, and institutional practices could all affect AI model performance in ways that retrospective studies cannot fully capture. The authors explicitly call for large-scale, prospective, multicenter clinical trials to validate these findings.
Small sample sizes and limited cohorts: With only 8 studies and 13 cohorts total, the meta-analysis has limited statistical power. The cervical stromal invasion analysis relied on just 5 cohorts from 2 studies, making its pooled estimates particularly fragile. The relatively small number of studies also prevented the authors from conducting subgroup analyses based on AI model type, which would have been valuable for identifying which architectures perform best. Future updated analyses will need to include more recent results as the field expands.
Model heterogeneity and standardization: The included AI models exhibited significant differences in algorithmic architecture (YOLOv3, ResNet, AdaBoost, radiomics, deep learning, ensemble methods), feature extraction methods, and training processes. This heterogeneity is a major contributor to the statistical variability observed in the pooled results and highlights an urgent need for standardization in AI model development for this application. Improving model transparency and reproducibility is crucial to ensuring comparability across studies and ultimately translating AI tools into clinical practice.
Translational readiness requirements: The authors stress that beyond pursuing high diagnostic accuracy, future research must evaluate "translational readiness," including model interpretability (using saliency maps to highlight decision-relevant image regions), computational efficiency for real-world deployment, and seamless integration into existing clinical workflows such as PACS and RIS systems. Models must be not only accurate but also transparent, efficient, and user-friendly to gain clinician trust and achieve regulatory approval. The external validation datasets used in some included studies were acquired with different imaging equipment and scanning protocols, adding another source of variability that real-world deployment must address.
Overall conclusion: AI-based MRI can improve the accuracy of preoperative staging of endometrial cancer to a certain extent, with pooled AUCs of 0.83 for deep myometrial invasion and 0.90 for cervical stromal invasion. However, the evidence base remains small, geographically limited, and methodologically heterogeneous. Additional large-scale, prospective, multicenter trials are necessary before AI-MRI can be confidently integrated into routine clinical staging workflows.