ML for Endometrial Cancer Prediction and Prognostication

Plain-English Explanations

Overview & Background

Pages 1-2

Why Machine Learning Matters for Endometrial Cancer

Endometrial cancer (EC) is the most common cancer of the uterus and a major contributor to cancer-related morbidity and mortality worldwide, with rising incidence especially in countries undergoing rapid socioeconomic transitions. The International Federation of Gynecology and Obstetrics (FIGO) staging system is used for surgico-pathological staging. While most patients (80%) are diagnosed at stage I with a 5-year survival rate exceeding 95%, the outlook is far worse for advanced disease: stage III patients face a 5-year survival of only 47% to 58%, and stage IV patients just 15% to 17%. The lack of clinically validated screening methods, combined with costly and time-consuming diagnostic approaches, contributes significantly to high mortality rates.

The clinical problem: There are currently no validated EC screening tools. Endometrial biopsy with dilation and curettage (D&C) is the standard diagnostic approach, but it is painful, expensive, requires general anesthesia, and has a misdiagnosis rate of up to 31%. Transvaginal ultrasonography (TVUS) is useful for monitoring the endometrium but lacks sufficient specificity to distinguish benign lesions such as polyps from malignant ones. Postmenopausal vaginal bleeding (PMB) is present in 90% of EC patients, yet only 9% of women presenting with PMB are actually diagnosed with EC, making it an unreliable standalone indicator.

The ML opportunity: Machine learning, a subfield of artificial intelligence, uses statistical, probabilistic, and optimization techniques to enable computers to learn from data and detect complicated patterns in large, noisy datasets. ML algorithms have already been successfully applied to breast cancer, prostate cancer, oesophageal cancer, head and neck cancer, and other malignancies. The authors argue that ML can be particularly relevant to EC because patients with diverse outcomes could be subcategorized into specific clinical stages more effectively, improving early detection, treatment selection, and survival prediction.

This review, published in Frontiers in Oncology, was authored by researchers from Tsinghua University (Shenzhen), St. John's Research Institute (Bangalore), and Shenzhen Bay Laboratory. It provides a comprehensive analysis of ML applications across prevention, screening, detection, and prognosis of EC, targeting an audience of oncologists, molecular biologists, biomedical engineers, and bioinformaticians.

TL;DR: EC is diagnosed at stage I in 80% of cases (5-year survival over 95%), but advanced stages carry survival rates as low as 15-17%. With no validated screening tools and a D&C misdiagnosis rate of up to 31%, ML offers a path toward more accurate, cost-effective detection and prognostication.

Disease Biology

Pages 2-3

Clinico-Pathological Features: Type I vs. Type II EC and Risk Factors

EC is traditionally classified into two types with distinct epidemiology, histology, prognosis, and treatment profiles. Type I EC is the most common form, accounting for 80% of diagnosed cases. It has an overall 5-year survival rate of 81.3% and usually less than a 20% recurrence rate. Type I tumors are predominantly low-grade endometrioid adenocarcinomas (FIGO grades 1 and 2), are estrogen-driven, and are strongly associated with obesity-related complications, hyperestrogenism, hyperlipidemia, diabetes, and anovulatory uterine bleeding. Obesity affects more than 70% of individuals with early-stage EC and is the single most significant risk factor for progression from hyperplasia to malignancy.

Type II EC accounts for only 20% of cases but is disproportionately lethal, contributing to approximately 40% of all EC deaths. Type II tumors are high-grade, non-endometrioid histologies (primarily serous carcinomas and clear cell carcinomas), are frequently diagnosed at a late stage, and carry a decreased 5-year overall survival rate of 55%. Type II EC is more prevalent in elderly, postmenopausal women and is especially common among African-American women. About 20% of endometrioid cancers are subcategorized as high grade (FIGO grade 3) and grouped with type II.

Genetic risk factors: Hereditary conditions including Lynch syndrome, Cowden syndrome, and polymerase proofreading-associated polyposis increase EC risk. Lynch syndrome, caused by germline mutations in mismatch repair (MMR) genes (MLH1, MSH2/6, or PMS2) or germline deletions in the EpCAM gene, is particularly noteworthy because approximately 3% of ECs are MMR-deficient. These molecular subtypes are becoming increasingly important for both diagnosis and treatment selection, making them natural targets for ML-based classification models.

Additional independent risk factors include age over 50, hypertension, diabetes mellitus, thyroid disease, family history, premature menarche, late menopause, tamoxifen use, nulliparity, infertility, and polycystic ovarian disease. The complexity of these interacting risk variables is precisely the type of multidimensional pattern recognition problem where ML algorithms excel compared to traditional statistical methods.

TL;DR: Type I EC (80% of cases, 81.3% 5-year survival, estrogen-driven, obesity-linked) has a favorable prognosis, while Type II EC (20% of cases, 55% 5-year survival, serous and clear cell histology) accounts for 40% of EC deaths. Genetic conditions like Lynch syndrome (MMR gene mutations) and multiple metabolic risk factors make EC risk prediction a strong candidate for ML approaches.

Methodology

Pages 3-5

Machine Learning Methods: Algorithms, Training Paradigms, and the Imbalanced Data Problem

The review covers the three principal ML training paradigms: supervised learning (using labeled data to predict outcomes), unsupervised learning (discovering hidden patterns without labels), and reinforcement learning (learning through trial-and-error feedback). The authors provide a detailed comparison of widely used algorithms including Decision Trees, Naive Bayes, k-Nearest Neighbor (kNN), Neural Networks, Support Vector Machines (SVM), and Genetic Algorithms, evaluating the advantages and limitations of each. For example, Decision Trees are simple and efficient to train but prone to overfitting with inaccurate training data. SVMs are resistant to overfitting and reduce computational complexity to a quadratic optimization problem, but finding optimal settings is difficult when training data are not linearly separable.

The class imbalance challenge: A critical problem in applying ML to EC datasets is class imbalance, where one target class (for example, cancer-positive cases) has far fewer observations than the other. Most conventional ML algorithms struggle with imbalanced datasets because they tend to bias predictions toward the majority class. The authors discuss several strategies to address this: random oversampling (duplicating minority class examples), random undersampling (removing majority class examples), ensemble learning techniques, cost-sensitive learning, one-class learning, and active learning. No single strategy is universally effective; the best approach depends on the specific characteristics of the imbalanced dataset.

Deep learning (DL) as a subset: The review positions DL as a subgroup of ML that employs statistical and mathematical models, with convolutional neural networks (CNNs) being particularly relevant for image-based EC applications. The authors note that the emergence of computer-aided systems in medical imaging, bioinformatics, and medical robotics has been driven by advancing computational power, improved pattern recognition algorithms, and enhanced image processing software. A "cognitive" computer exposed to big data can scan billions of bits of unstructured data and detect complicated patterns with growing confidence.

The authors emphasize that ML's key strengths over conventional biostatistical approaches include versatility and scalability, enabling functions such as threat stratification, diagnosis and classification, and survival prediction. A further advantage is the ability to integrate different data types (population data, experimental outcomes, imaging) to identify cross-modal patterns. However, specific challenges in healthcare include data preprocessing requirements, experimental design constraints, and the need for algorithm refinement tailored to each clinical question.

TL;DR: The review covers supervised, unsupervised, and reinforcement learning paradigms along with six core algorithms (Decision Tree, Naive Bayes, kNN, Neural Networks, SVM, Genetic Algorithms). Class imbalance is a major challenge in EC datasets, requiring strategies like oversampling, ensemble learning, and cost-sensitive learning. Deep learning, particularly CNNs, is highlighted for image-based EC applications.

Diagnostic Imaging

Pages 5-7

ML in Image Recognition, Pattern Detection, and Endometrial Lesion Classification

MRI-based tumor segmentation: Hodneland et al. demonstrated a fully automated approach for tumor segmentation in EC using a 3D convolutional neural network called UNet3D, applied to a cohort of 139 EC patients with preoperative pelvic MR images. The model generated tumor volume estimates, tumor borders, and volumetric tumor maps, achieving segmentation accuracy at the level of a human expert. This automated approach enables near-real-time whole-volume radiomic tumor profiling, including texture properties, which could be useful for risk stratification and personalized treatment planning.

Predicting deep muscle invasion: Dong et al. created a deep learning model to predict deep myometrial invasion using 4,896 MR images from 72 EC patients and achieved an accuracy rate of 75%, although the difference compared to radiologist readings was not statistically significant. In a similar study, Chen et al. analyzed 530 MR images and achieved 84% accuracy, 66.7% sensitivity, and 87.5% specificity. Xu et al. developed a prediction model for lymph node metastasis (LNM) of normal size using MR images and CA125 values from 200 EC patients, reporting approximately 85% accuracy. These studies collectively demonstrate that ML can approach expert-level diagnostic performance on MRI-based EC assessments.

Endometrial cytology and hysteroscopy: Markis et al. developed an automatic diagnostic system to analyze liquid endometrial cytology images from 416 patients using deep learning, achieving 90% accuracy. For hysteroscopic image classification, Zhang et al. developed a CNN-based computer-aided diagnosis system using the VGGNet-16 model, training on 1,851 hysteroscopic images of uterine patients and achieving 80.8% overall accuracy for classifying endometrial lesions. These approaches offer minimally invasive, less expensive diagnostic alternatives to traditional methods.

Classifying DNA mismatch repair-deficient tumors: Veeraraghavan et al. used contrast-enhanced CT to identify DNA MMR-deficient and/or tumor mutational burden-high (TMB-H) subtypes in ECs. This study built two ML models using generalized linear regression (GLMNet) and recursive feature elimination random forest (RF) classifiers on a cohort of 422 patients to differentiate between low copy number and high copy number MMR-deficient tumors. Their findings indicated that radiomic models using ML algorithms can serve as reproducible complementary diagnostics for clinical trial enrollment and standard-of-care treatment.

TL;DR: UNet3D achieved expert-level tumor segmentation on 139 EC patients. Deep muscle invasion prediction reached 75-84% accuracy across studies. LNM prediction using MRI plus CA125 values hit approximately 85% accuracy on 200 patients. VGGNet-16 classified hysteroscopic images at 80.8% accuracy on 1,851 images. DL-based endometrial cytology analysis achieved 90% accuracy on 416 patients.

Prognostication

Pages 7-8

ML Algorithms for EC Prognosis: Recurrence Prediction and Lymph Node Assessment

Clustering for prognostic prediction: Praiss et al. developed an unsupervised ML algorithm called EACCD (Ensemble Algorithm for Clustering Cancer Data) to classify EC patients based on TNM staging, grade, and age. EACCD works by continuously applying criteria-based clustering to derive dissimilarity estimates, then combining these with hierarchical clustering to identify ultimate patient clusters. This approach improved prognostic prediction for EC compared to standard methods. Separately, Chen and colleagues developed ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumors), which uses gene expression data to predict tumor content and the degree of infiltrating stromal and immune cells. ESTIMATE total scores were found to be substantially associated with tumor purity and have been validated across breast cancer, glioblastoma, prostate cancer, colon cancer, and cutaneous melanoma.

Predicting recurrence: Despite 80% of EC patients presenting at early stages with good prognosis, approximately 15% of stage I and II patients develop recurrence. Akazawa et al. applied five ML algorithms to predict recurrence based on clinical parameters including age, body mass index, stage, histological type, grade, surgical content, and adjuvant chemotherapy. The five algorithms were Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Boosted Tree. Results showed SVM achieved the maximum accuracy, followed by LR, while LR demonstrated the best AUC and was reported as the overall best predictive model. The boosted tree algorithm had the lowest accuracy, and RF had the lowest AUC.

Lymph node involvement prediction: Gunakan et al. investigated the Naive Bayes (NB) algorithm for lymph node involvement (LNI) prediction using histopathological factors such as final histology, lymphovascular space invasion (LVSI), grade, tumor diameter, depth of myometrial invasion, cervical glandular stromal invasion, and tubal or ovarian involvement. The algorithm predicted LNI with high accuracy. Reijnen et al. developed and externally validated a preoperative Bayesian network called ENDORISK using 763 surgically treated EC patients. ENDORISK incorporated molecular, histological, and clinical biomarkers and achieved high discriminative performance. With a marginal false negative rate of just 1.6%, ENDORISK identified more than 55% of patients at 5% risk for LNM, demonstrating the utility of multimodal biomarker integration for personalized risk assessment.

Gene expression-based prognostication: Yin et al. developed a prognostic model for endometrioid endometrial adenocarcinoma (EEA) combining gene expression data with traditional clinical features using Random Forest. Three models were tested: 11 genes alone, stage and grade alone, and the combination of 11 genes plus stage and grade. The combined "genes and grade" RF model outperformed both single-modality models, indicating that integrating molecular and clinical features produces stronger predictive ability for EEA prognosis.

TL;DR: EACCD clustering improved EC prognostic prediction by grouping patients on TNM staging, grade, and age. For recurrence prediction, LR achieved the best AUC among five algorithms tested. ENDORISK, a Bayesian network validated on 763 patients, identified over 55% of patients at 5% LNM risk with only a 1.6% false negative rate. Combined gene expression and clinical feature RF models outperformed single-modality approaches.

Screening & Risk Prediction

Page 8

ML Models for EC Screening, Risk Classification, and Prevention

Population-level risk stratification: Hart et al. developed seven alternative ML models to estimate the likelihood of an individual woman developing EC within 5 years, using publicly available personal health data. The models tested were Logistic Regression (LR), Neural Network (NN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Linear Discriminant Analysis, and Naive Bayes (NB). The RF model outperformed all six other models in AUC, with the NN coming in second. Both top-performing models were used to divide the population into three risk groups: low, medium, and high risk. Importantly, these predictions were based entirely on personal health information before disease onset, without any invasive or expensive procedures.

Clinical relevance of risk tiers: The three-tier risk classification was compared against physicians' clinical judgment and showed encouraging results. This approach offers a non-invasive, cost-effective method of identifying high-risk subpopulations who might benefit from early screening and preventive interventions such as dietary and activity modifications, progestin or antiestrogen medication, insulin-lowering therapy, and scheduled endometrial biopsies. The key advantage is identifying women at high risk before symptoms develop, enabling a shift from reactive to preventive care.

Neural network risk prediction: Hutt et al. used a statistical meta-analysis technique to establish the rank order of risk factors for EC and generate a collective risk percentage for each factor, drawing data from the National Cancer Institute (NCI). They then designed a Neural Network computer model that could predict whether a patient's overall cancer risk would increase or decrease. The model achieved 98.6% accuracy in predicting overall cancer risk and diagnosis for specific patients. The findings suggest this approach could effectively reduce unnecessary invasive testing, serving as a valuable tool for physicians to determine whether individuals require enhanced preventative measures.

TL;DR: RF outperformed six other ML models in predicting 5-year EC risk from personal health data alone. A NN-based risk model achieved 98.6% accuracy in predicting cancer risk using NCI data. These non-invasive approaches enable three-tier risk stratification (low, medium, high) to guide early screening and preventive interventions before symptoms develop.

Limitations

Pages 8-10

Limitations of ML Approaches in Endometrial Cancer

Retrospective bias and lack of prospective validation: The vast majority of studies reviewed used retrospective designs, training and testing algorithms on previously labeled data. Prospective studies are essential to determine the true utility of these systems in real-world clinical settings, but remain rare. Additionally, there are very few randomized controlled trials, which are the gold standard for building trust and acceptance of ML among the medical community. High-quality reporting of ML experiments is needed so that the probability of bias and the utility of prediction models can be accurately assessed.

Metrics that do not reflect clinical utility: The AUC of a receiver operating characteristic curve, while widely used in ML research, is often not the strongest measure for representing clinical validity and is difficult for many clinicians to interpret. The authors argue that alternative evaluation methods such as decision curve analysis, which attempts to measure total clinical benefit by using a formula to direct future behavior, should be employed more frequently. Most published articles do not attempt to demonstrate how their proposed algorithms would enhance patient treatment in real-world settings.

Difficulty comparing algorithms and dataset shift: Quantitative comparison across studies is extremely difficult because each study uses different tools, methods, samples, and performance metrics. Fair comparisons require testing algorithms on a similar independent test set representative of the target population using comparable effectiveness measures. Furthermore, clinical and operational practices evolve constantly, creating non-stationary data environments. When a novel predictive algorithm is introduced, it may induce operational shifts that produce data distributions different from those used during training, requiring drift detection systems and model updates.

Algorithmic bias and generalization failures: Clinical assessments should be performed on representative samples of the planned implementation population, accounting for factors such as age, race, sex, sociodemographic stratum, and geographic location. Algorithm bias can be divided into three inputs: model bias (models optimized for the majority, not underrepresented groups), model variance and ambiguity (from insufficient minority data), and noise in results (unknown parameters affecting predictions). External validation, which involves evaluating AI systems with properly scaled datasets from organizations outside those that supplied training data, is critical but remains uncommon. A systematic assessment found that only 6% of 516 relevant scientific publications on AI for medical imaging completed external validation.

Data fragmentation and human obstacles: Medical data are typically segmented across imaging archives, diagnostic systems, electronic health records (EHRs), automated monitoring software, and insurance databases, making integration for ML training extremely difficult. Input data must be of good quality with few artifacts or noise levels, and incorrect labels can significantly degrade model performance. Many ML models also require training sets with no missing characteristics. Beyond technical barriers, there are significant human obstacles to AI adoption in healthcare, including clinician training needs and the importance of maintaining focus on clinical applicability and patient outcomes.

TL;DR: Key limitations include overwhelming reliance on retrospective data, very few randomized controlled trials, metrics (like AUC) that do not capture clinical utility, and only 6% of 516 AI imaging studies completing external validation. Algorithmic bias from non-representative training populations, data fragmentation across siloed healthcare systems, and dataset shift in evolving clinical environments all hinder real-world deployment.

Future Directions

Pages 10-12

Future Perspectives: EHR Integration, Panoptes, and Emerging Technologies

MEDomics and EHR integration: The authors highlight the MEDomics framework as a promising approach for EC management. MEDomics is an AI-driven system that integrates electronic health records (EHRs) with continuous learning infrastructure, using multimodal clinical data from thousands of cancer patients and millions of data points. It automatically extracts and integrates analytical workflows, incorporates natural language processing (NLP) models for medical note extraction, and classifies patients into different risk groups. The system can develop hypotheses from patient data (rather than laboratory data alone) and help guide therapy selection and disease monitoring. MEDomics profiles synthesize a patient's full clinical service chronology and are utilized for various AI application designs.

Panoptes: a multiresolution deep CNN for molecular subtyping: A significant development highlighted in the review is Panoptes, a multiresolution deep convolutional neural network that uses pathological images to predict gene mutations and histological and molecular subtypes in EC. Unlike traditional CNN architectures that process a single image tile, Panoptes takes a group of three tiles from the same region at different magnifications. The model can read one slide in just 4 minutes and predict 18 common gene mutations without sequencing analysis, providing a cost-effective cancer detection method. The authors anticipate that such models trained for one cancer type might be applicable to other relevant cancers through transfer learning.

Molecular diagnostics and early detection: The PapSEEK test, which leverages the standard Pap test, was highlighted as an emerging screening technology that identified the majority of women with EC and one-third of women with ovarian cancer in an NCI-funded study. Sentinel lymph node (SLN) mapping is becoming increasingly common in EC management, with clinical trials showing the procedure is feasible and safe. SLN biopsy helps patients avoid the adverse consequences of total lymphadenectomy while ensuring accurate staging for prognosis and treatment planning.

Data challenges and emerging solutions: The authors acknowledge that most AI models are trained on small datasets, compromising accuracy. They propose that lab-on-a-chip and organ-on-a-chip technologies could be explored to simulate the tumor microenvironment and generate clinically relevant larger datasets. For the pressing challenge of clinical data sharing due to ethical and legal barriers, they suggest federated learning approaches where raw data remains with the source institution and only processed models are shared. This "web of information sharing in a protected way" requires robust AI systems with smart strategies for secure data exchange and continuous learning.

TL;DR: MEDomics integrates EHRs with NLP and continuous learning for EC management. Panoptes, a multiresolution CNN, reads one slide in 4 minutes and predicts 18 gene mutations without sequencing. PapSEEK and SLN mapping represent emerging early detection tools. Federated learning and lab-on-a-chip technologies are proposed to address data scarcity and sharing barriers.

Machine Learning for Endometrial Cancer Prediction and Prognostication

Original Paper (PDF)