Artificial Intelligence in Gynecological Oncology from Diagnosis to Surgery

PMC 2024 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations

1. Overview: AI's Growing Role in Gynecological Cancer Diagnosis and Surgery

This 2025 narrative review by Restaino and colleagues examines the current state of artificial intelligence applications across the full spectrum of gynecological oncology, from initial screening and diagnosis through surgical planning and intraoperative decision-making. The authors focus on three major gynecological malignancies: ovarian cancer, endometrial cancer, and cervical cancer. Their central finding is a striking imbalance: diagnostic AI research is far more advanced than surgical AI research, with the latter largely confined to ovarian cancer surgery.

Publication landscape: A PubMed search conducted on 24 August 2024 reveals the scope of this imbalance clearly. Cervical cancer leads with 852 AI-related publications, mostly focused on cytology-based screening. Ovarian cancer follows with 600 publications, and endometrial cancer has 353. When the search is narrowed to AI in gynecological oncological surgery specifically, the number of results drops dramatically, underscoring how little work has been done on AI-assisted surgical applications compared to diagnostics.

AI modalities covered: The review spans multiple AI-driven data modalities, including path-omics (digitalized histopathology analysis), genomics and transcriptomics (genetic diagnosis), proteomics and metabolomics (enzyme and metabolite assessment), and radiomics (imaging-based feature extraction). These approaches share a common strength: the ability to extract vast quantities of computer-derived measurements from digitalized clinical data. Deep learning models in particular are demonstrating substantial capacity for improving clinical practice across many diagnostic settings.

A key challenge, automation bias: The authors highlight automation bias as perhaps the most critical barrier to AI adoption in gynecological oncology. Many AI techniques lack transparency in their decision-making processes, making it difficult for clinicians to understand how conclusions are reached. This opacity conflicts with the universality of scientific rationale in medicine, where clinicians need to understand and trust the reasoning behind clinical decisions. Additionally, the applicability and generalizability of AI models are often limited by the size and diversity of training datasets.

TL;DR: This narrative review surveys AI in gynecological oncology across ovarian, endometrial, and cervical cancers. Diagnostic AI research dominates (852 publications for cervical cancer, 600 for ovarian, 353 for endometrial), while surgical AI applications remain sparse and largely limited to ovarian cancer. Automation bias and lack of model transparency are flagged as the most critical barriers to clinical adoption.

2. Methodology: A Narrative Review of PubMed Literature

The authors conducted a narrative (not systematic) literature review using PubMed as their primary database. The search strategy employed "artificial intelligence" as a MeSH term, combined with disease-specific terms such as "gynecologic cancer," "ovarian cancer," "endometrial cancer," and "cervical cancer." For the surgical component, MeSH terms including "surgery," "AI," and "gynecology oncology" were combined in various ways along with synonyms. The literature search was updated through 24 August 2024.

Selection approach: Unlike a systematic review with predefined inclusion and exclusion criteria, the authors selected what they considered the most interesting and informative articles on the subject. They focused on two main categories: studies demonstrating direct AI applications in supporting and improving diagnostic or surgical performance in gynecological oncology, and pertinent reviews that provided broader context about AI in the medical scientific field, including its pitfalls and strengths in clinical practice. Supplementary Table S1 documents the search strategy used.

Scope and structure: The review is organized into diagnostic perspectives and surgical applications. On the diagnostic side, the authors cover biomarker-based AI models for ovarian cancer risk assessment, metabolomics-based approaches, cfDNA fragmentomics, and age-specific reference interval estimation. On the surgical side, they examine AI models for predicting residual disease, postoperative morbidity, length of stay, and cytoreduction outcomes. This dual structure allows the reader to directly compare the maturity and evidence base of AI in each domain.

It is worth noting that because this is a narrative review rather than a systematic one, there is no formal risk-of-bias assessment (such as QUADAS-2 or PROBAST), no PRISMA flow diagram, and no meta-analytic pooling of results. The findings are presented qualitatively, summarizing individual study results rather than producing pooled effect estimates. This approach is appropriate given the heterogeneity of AI applications across different cancer types and clinical settings, but it means the conclusions should be interpreted as an expert overview rather than as statistically synthesized evidence.

TL;DR: This is a narrative review using PubMed MeSH term searches updated through August 2024. The authors selected key articles across diagnostic and surgical AI applications in gynecological oncology. No formal systematic methodology (PRISMA, QUADAS-2, meta-analysis) was applied, so findings represent a qualitative expert overview of the field rather than pooled statistical evidence.

3. AI in Ovarian Cancer Diagnostics: Biomarkers, Neural Networks, and Multi-Model Fusion

MIA3G, a deep feedforward neural network: Reilly and colleagues (2022) developed MIA3G, a deep feedforward neural network for ovarian cancer risk assessment that uses seven protein biomarkers (CA125, HE4, beta-2 microglobulin, apolipoprotein A-1, transferrin, transthyretin, and follicle-stimulating hormone) along with age and menopausal status as inputs. The model was trained on 1,067 serum specimens and validated on a cohort almost twice that size. In the analytical validity dataset, which simulated a real-world prevalence of 4.9% for ovarian malignancy, MIA3G achieved a sensitivity of 89.8%, specificity of 84.0%, positive predictive value of 22.5%, and negative predictive value of 99.5%. Training used 1,050 specimens with 30% positive cases, while validation on 2,000 specimens reflected a low-prevalence population (approximately 5%).

Multi-criteria decision-making classification fusion (MCF): A multicenter retrospective study from three large Chinese hospitals screened 52 features from 99 items (98 laboratory tests plus age) across 6,778,762 laboratory examinations. The MCF model integrated the best 20 AI classification models into a single ensemble. It achieved an AUC of 0.949 (95% CI: 0.948-0.950) for differentiating ovarian cancer from non-cancer, outperforming CA125 and HE4 used alone, especially for early-stage ovarian cancer prediction.

Recurrent neural networks for longitudinal biomarkers: Abrego et al. used data from the UK Collaborative Trial of Ovarian Cancer Screening to test different combinations of biomarkers, including CA125, using recurrent neural networks (RNNs) with longitudinal observations. While the design was encouraging, the study was limited by a small sample size and the need for a broader panel of biomarkers. Separately, LeBien and colleagues explored convolutional neural networks (CNNs) to estimate age-specific reference intervals for CA-125 in a Puerto Rican cohort, concluding that the widely accepted upper limit of 35 U/mL should not be applied universally and that age-adjusted cut-offs should be developed.

cfDNA fragmentomics and the DELFI approach: AI has been applied to genome-wide methylation analysis of circulating cell-free DNA (cfDNA) and plasma cfDNA fragmentomics. The DNA Evaluation of Fragments for Early Interception (DELFI) approach assesses fragment coverage, size, and other summary statistics within 5 Mb windows. This emerging technique leverages the fact that cancer cells shed DNA fragments with distinctive patterns that can be detected in blood samples, offering the potential for non-invasive cancer screening.

TL;DR: For ovarian cancer diagnostics, MIA3G achieved 89.8% sensitivity and 99.5% NPV using seven protein biomarkers in a deep feedforward neural network validated on 2,000 specimens. The MCF ensemble of 20 AI models reached an AUC of 0.949 across 6.7 million lab examinations. Additional approaches include RNNs for longitudinal biomarker tracking, CNNs for age-specific CA-125 thresholds, and DELFI cfDNA fragmentomics for non-invasive screening.

4. Metabolomics and Endometrial Cancer: ML Models Approaching 99% Accuracy

Metabolomics for ovarian cancer: Machine learning has been applied to identify patterns embedded within metabolomics data, since perturbations of metabolite levels in blood and other body fluids reflect the collective molecular information encoded at the genome, transcriptome, and proteome levels. Ban and colleagues (2024) developed an ML-based approach for predicting ovarian cancer using metabolomic profiles, specifically 3-Hydroxydodecanedioic acid and ceramide, from serum samples of 431 OC patients and 133 healthy donors, achieving a positive predictive value (PPV) of 93%. A separate 2023 study identified multiple metabolites associated with ovarian cancer across five metabolic pathways: Nicotinate and Nicotinamide Metabolism, Glycolysis/Gluconeogenesis, Aminoacyl-tRNA Biosynthesis, Valine/Leucine/Isoleucine Biosynthesis, and Alanine/Aspartate/Glutamate Metabolism. The most accurate classification model achieved 85.29% accuracy using 10-fold cross-validation.

cfDNA fragmentomics for endometrial cancer: AI has been applied to cfDNA fragmentomics to differentiate uterine corpus endometrial carcinoma (UCEC) from healthy conditions. Using low-coverage whole-genome sequencing, researchers analyzed 111 UCEC patients and 111 healthy donors with an ML framework incorporating three distinct feature types: copy number variations (CNVs), feature selection dimensionality (FSD), and nuclear features (NFs). The model demonstrated excellent predictive power, with high AUC values in both the training and independent validation cohorts.

Blood metabolome screening for endometrial cancer: Perhaps the most striking result in the review concerns a diagnostic study using gas chromatography-mass spectrometry to analyze metabolites from dry blood samples of postmenopausal women. The study enrolled a multicenter prospective training cohort of 50 endometrial cancer cases (FIGO stage I-III, grade G1-G3) and 70 matched controls, using these to train multiple classification models. The accuracy of each trained model was then used as a statistical weight to produce an ensemble ML algorithm. This ensemble was validated on a subsequent prospective cohort of 1,430 postmenopausal women. The results were remarkable: zero false-negative results and only two false-positive results out of 1,430 samples, yielding an accuracy of 99.86%.

These metabolomics-based approaches represent a promising direction for non-invasive cancer screening. The endometrial cancer blood metabolome study is particularly compelling because it used a prospective validation design with a large cohort, and the near-perfect accuracy suggests that metabolic signatures may offer a reliable, cost-effective screening pathway. However, it should be noted that these results require further independent replication across diverse populations and clinical settings before they can be incorporated into routine screening programs.

TL;DR: Metabolomics-based ML models show strong diagnostic potential. For ovarian cancer, serum metabolomic profiling achieved 93% PPV (431 patients) and 85.29% accuracy across five metabolic pathways. For endometrial cancer, a blood metabolome ensemble algorithm validated on 1,430 postmenopausal women achieved 99.86% accuracy with zero false negatives, and cfDNA fragmentomics using CNVs, FSD, and NFs showed high AUC for distinguishing UCEC from healthy tissue.

5. AI in Gynecological Oncology Surgery: Residual Disease, Morbidity, and the PROMEGO Score

Predicting residual disease post-hysterectomy: Machine learning models including extreme gradient boosting (XGBoost), Random Forest, and Logistic Regression have been applied to assess the risk of residual disease after hysterectomy for gynecological oncological conditions. These models used clinical and surgical parameters as inputs and identified the top postoperative predictors of residual disease: initial presence of gross abdominal disease on the diaphragm, disease located on the bowel mesentery, disease on the bowel serosa, and disease located within the adjacent pelvis prior to resection. Notably, no significant difference was found between the three models, and all contributed to enhancing clinical decision-making for adjuvant treatment planning.

The PROMEGO surgical risk calculator: The PROMEGO study (Predicting Risk of Post-Operative Morbidity and Mortality following Gynaecological Oncology Surgery) represents one of the most clinically relevant AI tools in this domain. Utilizing the international GO SOAR database dataset, the authors developed a novel predictive surgical risk calculator for postoperative morbidity and mortality. Preliminary data showed accurate prediction of thirty-day postoperative morbidity using variables readily available across all resource settings, making this tool potentially useful for surgeons in diverse clinical environments. It is particularly promising for guiding decisions around the extent of cytoreductive surgery.

The gap between diagnostic and surgical AI: The review highlights a fundamental asymmetry in the maturity of AI applications. While diagnostic AI benefits from large, digitalized datasets (laboratory values, imaging, genomics) that are naturally amenable to machine learning analysis, surgical AI faces unique challenges. Surgical outcomes are influenced by surgeon skill, institutional protocols, patient anatomy, and intraoperative variability, all of which are more difficult to standardize and digitalize. This explains why the surgical AI evidence base remains limited compared to diagnostics, and why most surgical AI studies have focused on ovarian cancer, where the complexity of cytoreductive surgery creates a compelling use case for predictive modeling.

TL;DR: XGBoost, Random Forest, and Logistic Regression models predict residual disease post-hysterectomy, identifying diaphragmatic disease, bowel mesentery involvement, bowel serosa disease, and pelvic disease as top predictors. The PROMEGO calculator uses the international GO SOAR database to predict 30-day postoperative morbidity. Surgical AI remains far less developed than diagnostic AI due to the inherent complexity and variability of surgical outcomes.

6. AI in Ovarian Cancer Surgery: From Resectability Prediction to the ANAFI Score

Predicting resectability in high-grade serous ovarian cancer: High-grade serous ovarian cancer (HGSOC) is the most frequent and one of the most aggressive epithelial histotypes. A major clinical problem is "open and close" surgeries, where patients undergo surgical exploration but cannot receive the planned HIPEC plus cytoreduction with radical intent because of unexpected unresectable disease. Maubert et al. developed prediction algorithms comparing various ML models and identified intestinal and pelvic carcinosis as the main criteria of non-resectability out of nine total criteria, using a combination of clinical and imaging data.

The Leeds L-AI-OS Score and ICU prediction: A predictive ML/deep learning score called the Leeds L-AI-OS Score was developed to predict length of stay after cytoreductive surgery. The model evaluates simple preoperative variables including age, BMI, and ECOG performance status, along with intraoperative time, surgical complexity score (SCS), and estimated blood loss. In a related effort using comparable variables plus the inclusion of intestinal resection with ostomy, the Graphical User Interface Calculator established the Leeds Natal Score, which predicts the risk of ICU admission and helps optimize ICU placement and surgical scheduling.

The SCS cut-off and the ANAFI Score: XGBoost and Deep Neural Network (DNN) models enabled the establishment of a surgical complexity score cut-off of 5, above which the probability of ineffective cytoreduction increases. Building on this, the ANAFI score was developed using the same XGBoost model to intraoperatively predict the likelihood of achieving complete cytoreduction based on specific anatomical disease fingerprints: small bowel mesentery, large bowel serosa, and diaphragmatic peritoneum involvement. The ANAFI score also proved to be the main prognostic feature for survival outcomes, which is notable because most AI scores in this context focus exclusively on predicting suboptimal surgery rather than linking directly to survival.

Additional surgical AI contributions: Laios et al., using datasets from ESGO-accredited centers, identified upper abdominal peritonectomy (UAP) and regional lymphadenectomies as the main features predictive of complete cytoreduction. The preoperative level of human epididymis protein 4 (HE4), studied alongside CA125, was found useful in identifying patients at higher risk for suboptimal cytoreductive surgery or those requiring more extensive procedures. In the recurrent setting, Bogani et al. (2018) used artificial neuronal networks (ANN) on 194 patients with platinum-sensitive recurrent ovarian cancer and identified three key factors driving complete secondary cytoreductive surgery: disease-free interval (DFI, the most important factor for overall survival), retroperitoneal recurrence, and FIGO stage at diagnosis.

TL;DR: For HGSOC surgery, AI models predict resectability (intestinal and pelvic carcinosis as key factors), length of stay (Leeds L-AI-OS Score), and ICU risk (Leeds Natal Score). The ANAFI score uses XGBoost to predict complete cytoreduction based on small bowel mesentery, large bowel serosa, and diaphragmatic disease, and also predicts survival. An SCS cut-off of 5 marks the threshold for ineffective cytoreduction. For recurrent OC, ANN models identified disease-free interval as the top predictor of surgical success in 194 patients.

7. AI in Uterine and Cervical Cancer Surgery: Early-Stage Efforts

Endometrial hyperplasia and concurrent carcinoma: One of the most clinically relevant questions for uterine cancer is whether AI can predict the concurrent presence of endometrial carcinoma in patients with atypical endometrial hyperplasia. A recent study compared different AI models of varying complexity and found that no model achieved a sensitivity greater than 50% for predicting concurrent endometrial cancer in women with a preoperative diagnosis of endometrial intraepithelial neoplasia (EIN). A multicenter retrospective Italian study reached similar conclusions, confirming that this remains an unsolved problem. The authors note that at present it is not possible to reliably predict concurrent EC using AI in this clinical scenario.

Metabolic signatures for endometrial cancer: In contrast to the hyperplasia prediction challenge, defining a unique metabolic signature for endometrial cancers appears more promising. A study investigating metabolomics as a non-invasive screening tool analyzed a large cohort of women undergoing gynecological surgery for EC versus benign conditions. Several metabolites involved in lipid and amino acid metabolism were identified as potential biomarkers that could facilitate earlier tumor diagnosis and inform tailored therapeutic strategies. This approach aligns with the broader trend of metabolomics-based AI diagnostics discussed earlier in the review.

Cervical cancer applications: For cervical cancer, the review identifies two notable AI applications in the surgical context. First, iPMI, an algorithm model proposed in a recent article published in Cancers, applies AI to predict parametric infiltration in early-stage cervical cancers, particularly relevant in the era of the SHAPE trial. The iPMI model could serve as a cost-effective and rapid approach to guide clinical decision-making about the extent of surgery. Second, a deep learning model of survival is under investigation to predict the prognosis of operable cervical cancer patients, with preliminary data described as encouraging.

The limited evidence for AI in uterine and cervical cancer surgery is particularly notable when compared to the relatively rich body of work on ovarian cancer surgery. The authors attribute this gap in part to the complexity and clinical urgency of cytoreductive surgery in ovarian cancer, which creates a strong incentive for predictive tools. For endometrial and cervical cancers, surgical decisions are typically more standardized, but emerging questions around conservative management, fertility preservation, and treatment de-escalation may drive increased AI research in these areas going forward.

TL;DR: AI in uterine cancer surgery is limited: no model exceeds 50% sensitivity for predicting concurrent EC in patients with atypical hyperplasia. Metabolomics-based screening using lipid and amino acid biomarkers is more promising. For cervical cancer, the iPMI algorithm predicts parametric infiltration for SHAPE-era surgical planning, and a DL survival model is in early testing. Both areas lag far behind ovarian cancer surgery in AI development.

8. Limitations, Ethical Concerns, and Future Directions

Legal liability and the role of expert oversight: The authors raise significant legal and ethical concerns about AI deployment in gynecological oncology. Deep learning models can assist diagnostic decision-making, but they should not replace expert opinions entirely. Legal liability becomes a critical issue when AI makes an incorrect diagnosis. In most clinical settings, medical decisions still require a physician's oversight, and guidelines must account for the possibility of human error when AI-generated recommendations are followed uncritically. The effort, the authors argue, must always be to keep AI subject to the guidance of the human mind experienced in the specific field.

Automation bias and training data issues: Deep learning and machine learning tools can detect errors either autonomously or with human oversight, but automatic bias in AI systems can lead to incorrect detections when training data are skewed or incomplete. Bias can arise from historical data patterns, reinforcing existing disparities and affecting clinical decision-making. The authors recommend that doctors verify AI-generated results by cross-checking with clinical guidelines, patient history, and their own expertise, using multiple AI tools for comparison, and staying aware of potential biases in training data. Regular auditing of AI models and collaboration with data scientists are essential for mitigating these risks.

Regulatory approval pathway: To establish a proper approval process for AI-based clinical tools, regulatory agencies such as the FDA must assess them for safety, efficacy, and transparency. This typically requires rigorous clinical trials, validation studies, and adherence to ethical standards. AI methods need to be tested for robustness, reproducibility, and alignment with established medical guidelines. The authors emphasize that the field currently lacks standardized frameworks for evaluating and approving AI tools in gynecological oncology, which represents a barrier to clinical translation.

Future perspectives and ongoing clinical trials: The review highlights several ongoing clinical trials applying AI in gynecological oncology, covering palliative care, genetic syndrome prevention, immunotherapy response prediction, histopathological prediction of tumors with uncertain malignancy significance, and radiomic diagnostic features for treatment optimization. The authors emphasize that emerging fields such as immunotherapy are closely tied to AI integration, and that these advancements are interconnected and mutually dependent. They predict that AI will significantly impact both diagnostic and surgical fields, with precision medicine and personalized tumor modeling as the ultimate goals. However, no studies about AI in vulvar cancer currently exist, representing a clear gap in the literature.

The data infrastructure challenge: AI algorithms require large datasets to build extensive databases, and these systems must be developed and overseen by trained personnel. Maintaining data quality and tracking over time demands skilled staff. The authors conclude that while the current landscape strongly indicates AI will have a significant impact on gynecological tumor diagnosis and treatment, how this influence will be integrated into clinical practice remains to be discovered. The intersection of diagnostic and surgical AI, along with precision medicine for individual patients, represents the most promising frontier.

TL;DR: Key barriers to AI adoption include legal liability for incorrect diagnoses, automation bias from skewed training data, and the lack of standardized regulatory approval frameworks. Ongoing clinical trials cover immunotherapy prediction, genetic syndrome prevention, and radiomic diagnostics. No AI studies exist for vulvar cancer. The authors call for physician oversight of all AI tools, regular model auditing, and large-scale prospective trials before clinical integration.
Citation: Restaino S, De Giorgio MR, Pellecchia G, et al.. Open Access, 2025. Available at: PMC11987942. DOI: 10.3390/cancers17071060. License: cc by.