AI-Assisted MRI for Differential Diagnosis and Prognosis of Endometrial Cancer

Plain-English Explanations

1. The Clinical Problem: Diagnosing and Stratifying Endometrial Cancer Risk

Endometrial cancer (EC) is one of the most common malignant tumors of the female reproductive system, originating from the endometrium and predominantly occurring in postmenopausal women. Incidence rises sharply with age and is significantly higher in developed countries, driven by population-level increases in obesity, diabetes, and the use of estrogen replacement therapy. Typical clinical symptoms include abnormal uterine bleeding (especially postmenopausal bleeding), pelvic pain, and increased vaginal discharge. In severe cases, the disease can spread to the fallopian tubes, ovaries, and peritoneum.

Diagnostic challenges: Although histopathological examination remains the gold standard for confirming EC, early symptoms lack specificity and are frequently mistaken for non-neoplastic conditions such as endometrial hyperplasia or endometriosis. Traditional imaging modalities, including ultrasound, CT, and magnetic resonance imaging (MRI), often fail to clearly display the depth of tumor invasion or the extent of spread to surrounding tissues. This leads to missed diagnoses or misdiagnoses, particularly in complex or early-stage cases.

Why MRI and AI together: MRI offers non-invasive, high-resolution soft-tissue imaging and has been widely adopted in EC clinical assessment. However, interpretation of MRI scans depends heavily on clinician experience and professional knowledge, introducing subjectivity and error risk. AI-based deep learning algorithms can extract complex features from large imaging datasets, potentially improving both diagnostic accuracy and efficiency. This study aimed to evaluate whether an improved convolutional neural network (CNN) model, combined with MRI, could enhance risk classification of EC patients and predict postoperative recurrence more reliably than traditional approaches.

Study scope: The authors retrospectively collected MRI image data from 210 EC patients at a single hospital imaging center (January 2021 to May 2024). Patients were divided into a test set of 140 cases and a validation set of 70 cases. Risk stratification followed the ESMO-ESTRO-ESP guidelines, classifying patients as either low-risk or high-risk EC. Postoperative recurrence status served as the endpoint event for prognostic modeling.

TL;DR: EC is a common gynecological cancer whose early symptoms overlap with benign conditions, making accurate diagnosis difficult. MRI is the standard imaging tool but depends on subjective interpretation. This study tested whether an AI-enhanced deep learning model applied to MRI data from 210 EC patients could improve risk classification and postoperative recurrence prediction.

2. Literature Review: AI in Medical Imaging and Prognostic Prediction

The authors surveyed the existing literature on AI applications in medical imaging, drawing from PubMed and Google Scholar databases with search terms including "artificial intelligence," "medical imaging," "deep learning," and "tumor detection," covering publications from 2010 to 2024. The review highlighted several relevant advances in tumor detection, image segmentation, and survival prediction that informed the design of the proposed model.

Image segmentation advances: Rong et al. (2023) developed HD-Yolo for whole-slide histology-based tumor detection, demonstrating superiority in nuclear detection, classification accuracy, and computational time across three tumor tissue types. Shen et al. (2023) proposed a U-Net backbone-based medical image segmentation algorithm with residual and convolutional decoder paths, improving segmentation accuracy for images with complex shapes and lesion-tissue adhesion. Zhang et al. (2023) designed a BN-U-Net algorithm for spinal MRI segmentation across 22 research subjects, achieving faster processing time and higher accuracy, sensitivity, and specificity compared to fully convolutional networks (FCN) and standard U-Net.

Feature learning innovations: Gao et al. (2020) proposed methods for identifying outliers in unbalanced datasets using imaging complexity concepts, enabling deep learning models to more effectively capture image features. Guan et al. (2023) integrated model-based and data-driven learning through three components: a linear vector space framework for global feature dependencies, a deep network for mapping to nonlinear manifolds, and a sparse model for local residual features. Additionally, Raimondo et al. (2024) developed a deep learning model to detect and classify endometrial lesions in hysteroscopic images from 1,500 images across 266 patients, though the model's overall performance still required improvement.

Prognostic prediction models: She et al. (2020) built a deep learning survival neural network for non-small cell lung cancer that outperformed TNM staging for predicting cancer-specific survival. Zhong et al. (2022) found that higher deep learning scores predicted poorer overall survival and relapse-free survival in stage I non-small cell lung cancer. Dong et al. (2020) constructed an imaging nomogram from CT images of 730 locally advanced gastric cancer patients that outperformed clinical N staging for predicting lymph node metastasis counts. Jiang et al. (2024) developed an attention-based unsupervised deep learning system for predicting overall and cancer-specific survival in resected colorectal cancer patients.

Despite these advances across multiple cancer types, the authors noted that AI applications specifically targeting EC diagnosis and prognosis remained underexplored, motivating the current study to fill that gap.

TL;DR: The literature review covered AI advances in tumor image segmentation (HD-Yolo, U-Net variants, BN-U-Net) and prognostic prediction (survival neural networks for lung, gastric, and colorectal cancers). While AI showed consistent benefits across multiple cancer types, its application to endometrial cancer specifically remained limited, providing the rationale for this study.

3. Model Architecture: ResNet-101 with Dual Attention Mechanisms

The proposed model is built on ResNet-101, a deep residual network with 101 layers that uses skip connections to enable training of very deep architectures without the vanishing gradient problem. The key innovation in this study is the addition of two attention mechanisms, channel attention and spatial attention, arranged in a serial (sequential) configuration within each residual block. This dual-attention approach allows the network to simultaneously weigh the importance of different feature channels and focus on specific spatial regions within the image.

Channel attention module: This module is added to each node of the ResNet-101 architecture. It works by first applying global average pooling to the input feature maps, compressing the spatial dimensions. Two fully connected layers then model the inter-channel relationships, producing output feature weights that match the dimensions of the input features. A Sigmoid activation function normalizes these weights, and the resulting values represent the attention level assigned to each feature channel. The mathematical formulation uses both global average pooling and global maximum pooling of the input features, processed through a multi-layer perceptron and a ReLU activation function, to produce the attention weight Z. The weighted feature L* is then computed by element-wise multiplication of Z and the original input feature L.

Spatial attention module: Also added to each residual block, this module processes input features using both max pooling and global average pooling to generate two channel descriptions. These descriptions are combined to transform the double-layer feature map into a single-layer feature map, followed by a 5 x 5 convolutional operation and Sigmoid activation to produce spatial weight coefficients. This allows the model to focus on the most informative spatial regions of the MRI image, such as tumor boundaries and invasion depth areas.

Serial configuration: The channel attention and spatial attention modules are arranged in series rather than in parallel. This serial structure enables stacking of more nonlinear activation functions, producing better non-residual blocks. The combined effect is that the model can simultaneously compute information across different channels in the feature map while also attending to local spatial information within each channel. This enhances the model's ability to learn complex image features relevant to EC risk stratification and recurrence prediction.

TL;DR: The model extends ResNet-101 (101 layers deep) with serial channel attention and spatial attention modules at each residual block. Channel attention uses global pooling, fully connected layers, and Sigmoid normalization to weight feature channels. Spatial attention uses max and average pooling with a 5 x 5 convolution to focus on informative image regions. The serial arrangement enables simultaneous cross-channel and local spatial feature learning.

4. Dataset, Experimental Setup, and Patient Characteristics

The study retrospectively collected data from 210 patients with pathologically confirmed endometrial cancer who underwent pelvic MRI examinations between January 2021 and May 2024. The dataset was split into a test set of 140 cases and a validation set of 70 cases. Patients were aged between 30 and 64 years, with a mean age of approximately 54.87 years in the test set and 54.80 years in the validation set. Inclusion criteria required a confirmed EC diagnosis, complete MRI imaging data, and full clinical follow-up records. Patients with other malignant tumors, substandard MRI image quality (motion artifacts or severe noise), or incomplete follow-up data were excluded.

Patient demographics: The test set included 102 cases of endometrial adenocarcinoma, 25 of serous carcinoma, and 13 of other types. In the validation set, there were 59 adenocarcinoma cases, 8 serous carcinoma cases, and 3 of other types. TNM staging distribution showed 85 stage I-II and 45 stage III cases in the test set, with 10 stage IV cases. The validation set contained 43 stage I-II, 21 stage III, and 6 stage IV cases. Comorbidities were also recorded: 65 patients had hypertension in the test set (32 in validation), 37 had diabetes (19 in validation), and 23 had coronary heart disease (11 in validation). Risk classification followed the ESMO-ESTRO-ESP guidelines.

Experimental environment: All experiments were conducted using the TensorFlow deep learning framework with GPU acceleration. The code was written in Python 3.6 within PyCharm, running on a system equipped with an NVIDIA GeForce RTX 2080 Ti graphics card, 64 GB of memory, an AMD Ryzen Threadripper 2950X processor, and Windows 10. Model parameters included a 5 x 5 convolutional kernel, convolution step of 1, filter size of 5 x 5, network depth of 101 layers, scaling factor of 16, regularization parameter of 2, and an initial learning rate of 0.01.

Statistical methods: SPSS 22.0 was used for statistical analysis. Normally distributed quantitative data were presented as mean plus or minus standard deviation, while non-normally distributed data used the median and interquartile range. Categorical data were expressed as frequency and percentage. Statistical tests included the Mann-Whitney test, one-way ANOVA, and chi-square test as appropriate. ROC curve analysis was used to assess diagnostic performance and compare AUC values across models. A two-tailed P value below 0.05 was considered statistically significant.

TL;DR: 210 EC patients (140 test, 70 validation) were included, with mean age around 54.8 years. Adenocarcinoma was the dominant pathological type (102/140 test, 59/70 validation). Experiments ran on TensorFlow with an RTX 2080 Ti GPU. The ResNet-101 model used 5 x 5 kernels, depth of 101 layers, learning rate of 0.01, and was evaluated using ROC curves and standard classification metrics (accuracy, precision, recall, F1).

5. MRI Imaging Findings: Case Examples and Feature Patterns

The paper presents two representative MRI cases that illustrate the imaging features the model was trained to detect. In the first case, a 56-year-old woman presented with menostaxis (cessation of menstruation) for 11 days. Her MRI showed low signal intensity on T1-weighted imaging (T1WI), elevated signal intensity on T2-weighted imaging (T2WI), and increased signal on diffusion-weighted imaging (DWI), with abnormal thickening of the endometrium. Pathological results confirmed endometrioid carcinoma, moderately to highly differentiated, with cancer cell infiltration depth less than half the thickness of the myometrium. The cervical canal was also involved.

The second case involved a 50-year-old female patient who presented with prolonged menstrual periods for half a year and irregular vaginal bleeding. Her MRI showed moderate signal on T1WI, high signal on T2WI, and high signal on DWI, again with abnormal endometrial thickening. Pathological examination revealed endometrial adenocarcinoma, moderately or highly differentiated, with cancer cell infiltration depth exceeding half of the myometrial thickness. This distinction in invasion depth is clinically critical because it directly influences surgical planning and the need for lymph node dissection.

Signal patterns and clinical relevance: The combination of T1WI, T2WI, and DWI signal characteristics provides the multi-sequence MRI data that the deep learning model processes. T2WI and DWI sequences are particularly informative for delineating tumor boundaries and assessing myometrial invasion depth. The spatial attention mechanism in the proposed model is designed specifically to focus on these boundary regions, while the channel attention mechanism weighs the relative importance of different MRI sequences for each classification decision.

These cases demonstrate why automated analysis is valuable: subtle differences in signal intensity and invasion depth, which require expert radiological interpretation, can be systematically captured and quantified by the AI model across all patients in the dataset.

TL;DR: Two representative MRI cases showed typical EC features: abnormal endometrial thickening with characteristic T1WI, T2WI, and DWI signal patterns. The critical clinical distinction, whether invasion depth is less than or greater than half the myometrial thickness, determines surgical approach. The AI model leverages multi-sequence MRI data through its dual attention mechanisms to detect these patterns systematically.

6. Risk Classification Results: AUC of 0.918 for High-Risk EC Diagnosis

In the validation set of 70 patients (45 low-risk EC and 25 high-risk EC, classified per ESMO-ESTRO-ESP guidelines), the proposed dual-attention ResNet-101 model achieved an AUC of 0.918 for diagnosing high-risk EC. This was substantially higher than the three comparison models: the traditional ResNet-101 achieved an AUC of only 0.613, the SA-ResNet-101 (spatial attention only) reached 0.760, and the CA-ResNet-101 (channel attention only) reached 0.758. The improvement from 0.613 to 0.918 represents a 49.8% relative increase in AUC over the baseline ResNet-101.

Classification metrics: Beyond AUC, the proposed model also showed significantly higher accuracy, precision, recall, and F1 scores compared to all three competing models (P < 0.05 for all comparisons). The superior sensitivity and specificity indicate that the model not only correctly identifies more high-risk patients but also avoids misclassifying low-risk patients as high-risk, which is important for avoiding unnecessary aggressive treatment.

Comparison with prior work: The results align with earlier findings by Men et al. (2018), who showed that attention-enhanced deep learning models improve diagnostic performance in oncological imaging. However, the study also contrasts with findings by Bus et al. (2021), who analyzed MRI reliability for preoperative staging in patients undergoing radical hysterectomy and found that conventional MRI had low sensitivity but high specificity. The authors attribute this discrepancy to their model's enhanced feature extraction capabilities through deep learning, which can capture risk characteristics that conventional radiological assessment may miss.

The substantial performance gap between the single-attention models (SA-ResNet-101 at 0.760 and CA-ResNet-101 at 0.758) and the combined dual-attention model (0.918) demonstrates that spatial and channel attention are complementary. Neither mechanism alone is sufficient to achieve the same level of diagnostic discrimination. The serial arrangement of both modules creates a synergistic effect, enabling the network to focus on both the most informative MRI sequences (through channel attention) and the most relevant anatomical regions (through spatial attention) simultaneously.

TL;DR: The dual-attention ResNet-101 achieved an AUC of 0.918 for high-risk EC diagnosis, compared to 0.613 (traditional ResNet-101), 0.760 (spatial attention only), and 0.758 (channel attention only). All accuracy, precision, recall, and F1 scores were significantly higher (P < 0.05). The combined attention mechanisms proved complementary, with neither spatial nor channel attention alone matching the performance of the joint model.

7. Recurrence Prediction Results: AUC of 0.926 for Postoperative Recurrence

Among the 70 patients in the validation set, follow-up data revealed that 13 cases (18.6%) experienced postoperative recurrence while 57 cases (81.4%) did not. Of the 13 recurrence cases, 9 occurred at the primary site, 3 in lymph nodes, and 1 in the abdominal cavity. Among the non-recurrence cases, the distribution included 34 at the primary site, 11 in lymph nodes, 8 in the cervix, and 4 in the abdominal cavity, reflecting the disease locations at last follow-up without recurrence events.

Model performance: The proposed dual-attention model achieved an AUC of 0.926 for predicting postoperative recurrence, outperforming the traditional ResNet-101 (AUC 0.620), SA-ResNet-101 (AUC 0.729), and CA-ResNet-101 (AUC 0.767). The accuracy, precision, recall, and F1 values for recurrence prediction were all significantly higher for the proposed model compared to the three comparison models (P < 0.05). This performance is particularly notable given the clinical importance of recurrence prediction, as it enables earlier intervention and personalized follow-up scheduling for patients identified as high-risk for recurrence.

Clinical significance: The ability to predict postoperative recurrence from preoperative MRI data could transform patient management. Patients identified as high recurrence risk could receive more aggressive adjuvant therapy or closer surveillance protocols, while those predicted to have low recurrence risk could potentially be spared unnecessary treatments and their associated side effects. The authors reference Eriksson et al. (2021), who used the ProMisE molecular classification system combined with ultrasound and demographic characteristics for preoperative recurrence prediction in EC. The current study demonstrates that deep learning-based MRI analysis can provide comparable or superior prognostic information.

F1 score consideration: The authors noted that while the proposed model's F1 score was higher than competitors, it remained relatively lower than the other metrics. They attributed this to a potential imbalance between precision and recall, likely influenced by the small number of recurrence cases (only 13 out of 70 in the validation set). This class imbalance is a common challenge in medical AI and suggests that future work with larger and more balanced datasets could further improve the F1 score.

TL;DR: The dual-attention model achieved an AUC of 0.926 for predicting postoperative recurrence (13/70 cases recurred). This exceeded traditional ResNet-101 (0.620), SA-ResNet-101 (0.729), and CA-ResNet-101 (0.767). All metrics were significantly better (P < 0.05). The relatively lower F1 score likely reflects the class imbalance of only 13 recurrence events in the validation set.

8. Limitations and Future Directions

Small sample size: The most significant limitation is the relatively small dataset of 210 patients total, with only 70 in the validation set and just 13 recurrence events. This limited sample size raises concerns about the model's generalizability and statistical power. A validation set of 70 cases, while sufficient to demonstrate trends, may not capture the full range of EC presentations across different populations, institutions, or MRI scanner types. Expanding to multi-center datasets with hundreds or thousands of patients would be necessary to confirm the model's clinical reliability.

Single-center, retrospective design: All data came from a single hospital imaging center, which introduces potential selection bias and limits external validity. The retrospective design means that data collection was not standardized prospectively, and MRI acquisition protocols may have varied over the January 2021 to May 2024 study period. Prospective, multi-center validation studies are needed before the model could be considered for clinical deployment.

Limited recurrence detail: The dataset lacked granular information on recurrence locations (local versus distant metastases), which limits the ability to fully assess the model's predictive performance for different recurrence patterns. Understanding whether the model is better at predicting local recurrence versus distant metastasis would be clinically valuable, as these outcomes carry different prognoses and require different management strategies.

Future research directions: The authors propose several avenues for improvement. First, further optimization of the deep learning architecture and parameter settings could enhance sensitivity and specificity for complex medical images. Second, multi-modal data fusion, combining MRI imaging with molecular biology markers and clinical features, could create a more comprehensive and multi-level prediction model. Third, integrating remote monitoring technology and big data analysis could enable real-time tracking and long-term follow-up prediction. Finally, the authors suggest that collecting detailed data on recurrence locations in future studies would support more precise clinical decision-making and personalized treatment planning.

Broader context: Despite these limitations, the study provides a proof of concept that dual-attention mechanisms meaningfully improve deep learning performance on endometrial cancer MRI analysis. The AUC improvements from 0.613 to 0.918 for risk diagnosis and from 0.620 to 0.926 for recurrence prediction represent clinically meaningful gains. The next step would be to validate these findings in independent, larger cohorts and to integrate the model into clinical workflows as a decision-support tool rather than a standalone diagnostic system.

TL;DR: Key limitations include a small sample size (210 patients, only 13 recurrence events), single-center retrospective design, and lack of detailed recurrence location data. Future work should focus on multi-center validation with larger cohorts, multi-modal data fusion (MRI plus molecular markers plus clinical features), and prospective clinical trials to confirm the model's utility in real-world settings.

Artificial Intelligence-Assisted Magnetic Resonance Imaging Technology in the Differential Diagnosis and Prognosis Prediction of Endometrial Cancer

Original Paper (PDF)