AI in Gynecological Cancer Screening and Diagnosis

Plain-English Explanations

Overview & Methodology

Pages 1-3

Scope, Background, and Review Methodology

This 2025 review from Liaoning University of Traditional Chinese Medicine covers the application of artificial intelligence across the three most common gynecological malignancies: cervical cancer (CCA), endometrial cancer (EC), and ovarian cancer (OC). In 2022, these three cancers accounted for approximately 1.473 million new cases and 680,000 deaths worldwide. The authors organized AI advances into four clinical domains: early screening, diagnosis, precise treatment, and prognosis prediction.

AI technology categories: The review divides AI into two major branches. Machine learning (ML) includes traditional algorithms such as support vector machines (SVM), decision trees, random forests (RF), and artificial neural networks (ANN). Deep learning (DL) is a subset of ML built on convolutional neural networks (CNN) that can automatically extract features from medical images and genomic data without manual feature engineering. DL architectures discussed include ResNet, VGG, U-Net, Xception, MobileNetV2, EfficientNetB0, and graph convolutional networks (GCN).

Clinical context: Traditional diagnostic methods face well-documented limitations. Pap smears and HPV testing for cervical cancer carry a 20% to 30% missed diagnosis rate due to subjective interpretation by pathologists. EC and OC diagnostics suffer from poor predictability using conventional imaging and biomarkers. Treatment planning is complicated by drug resistance mechanisms, variable physician experience across hospitals, and the impact of treatment toxicity on quality of life. AI has the potential to address these gaps by analyzing medical images, genomic data, and clinical information to enable more precise diagnoses and personalized treatment strategies.

Multi-omics integration: The review highlights that AI research in gynecological oncology is progressing from single-modality image analysis toward multimodal data fusion, incorporating radiomics, genomics, proteomics, and clinical indicators. This shift allows more comprehensive models for screening, diagnosis, and prognosis prediction. However, challenges remain in data quality, algorithm interpretability, and the clinical translation of AI tools.

TL;DR: This 2025 review covers AI applications across cervical, endometrial, and ovarian cancers (1.473 million new cases/year globally). It examines ML methods (SVM, RF, ANN) and DL architectures (CNN, ResNet, U-Net) applied to screening, diagnosis, treatment, and prognosis, noting that traditional diagnostics have 20-30% miss rates that AI aims to close.

Cervical Cancer Screening

Pages 4-5

AI-Assisted Cervical Cancer Screening and Cytology

Current cervical cancer screening relies on Pap smears and HPV testing, sometimes supplemented with colposcopy or biopsy. The review highlights multiple AI systems that have dramatically improved screening accuracy and efficiency. Bao et al. (2020) trained a supervised DL model on 188,542 cervical cytology images and tested it in a multi-center clinical study. For detecting CIN2+ lesions, the model achieved an AUC of 0.762, with diagnostic rates of 92.6% for CIN2 and 96.1% for CIN3. AI-assisted reading showed 1.26 times higher specificity than skilled cytologists. In a follow-up population study of 703,103 colposcopy images, the overall consistency between AI and manual cytology reached 94.7% (kappa = 0.92), with CIN2+ detection sensitivity 5.8% higher than manual reading.

AICCS system: Wang et al. (2024) developed the Artificial Intelligence Cervical Cancer Screening System (AICCS), which integrates a dual-model architecture combining cell detection and whole-slide image (WSI) classification. Trained on a multicenter dataset of 16,056 cases, AICCS achieved 89.2% accuracy, 94.6% sensitivity, 89.0% specificity, and an AUC of 0.947 in prospective evaluations. Cytopathologists using AICCS-assisted diagnosis showed significant improvements in AUC, specificity, sensitivity, and accuracy compared to traditional manual interpretation.

Automated dual-stain analysis: Wentzensen et al. (2021) developed a cloud-based whole-slide imaging system with a DL classifier for p16/Ki-67 dual-stained slides. In a study of 4,253 patients, AI-assisted dual staining maintained sensitivity while achieving significantly better specificity than Pap smears and manual dual staining (P < 0.001). The colposcopy referral rate dropped from 60.1% to 41.9% (P < 0.001), meaning fewer women would undergo unnecessary invasive procedures while still catching the same number of true positives.

Whole-slide and cell-level classification: Wang et al. (2021) developed a fully automatic DL system for cervical lesion classification from conventional Pap smear WSIs. In testing on 143 WSIs, the system achieved accuracy of 0.93, recall of 0.90, F-score of 0.88, and Jaccard index of 0.84. It significantly outperformed U-Net and SegNet benchmarks (P < 0.0001) while processing a single WSI in only 210 seconds, which is 19-20 times faster than competing methods. Shi et al. (2021) applied graph convolutional networks (GCN) to 966 Pap smear slides and achieved 98.37% accuracy, 99.80% sensitivity, and 99.69% specificity for cervical cell classification.

TL;DR: AI systems for cervical screening achieved AUCs of 0.762 to 0.947, with the AICCS system reaching 94.6% sensitivity on 16,056 cases. Automated dual-stain analysis cut colposcopy referrals from 60.1% to 41.9% while maintaining sensitivity, and graph convolutional networks achieved 98.37% accuracy in cell classification.

Cervical Cancer Diagnosis

Pages 5-6

AI for Cervical Cancer Subtyping, Staging, and Colposcopy

Beyond screening, AI has shown strong results in the more detailed classification tasks needed for cervical cancer management. The International Federation of Gynecology and Obstetrics (FIGO) staging system classifies cervical cancer into four stages based on tumor size, invasion depth, and extent of spread. WHO pathological classifications include squamous cell carcinoma (SCC), adenocarcinoma (AC), and adenosquamous carcinoma. AI models can now distinguish between these subtypes and stages non-invasively.

Radiomics for subtyping: Wang et al. (2022) explored multi-parameter MRI radiomics for differentiating histological subtypes of cervical cancer. They analyzed preoperative pelvic MRI from 96 patients (50 SCC and 46 AC), extracting 105 radiomics features from five MRI sequences. The T2SAG sequence alone showed the best single-sequence performance (accuracy = 0.844, AUC = 0.86). Combining all five sequences produced the strongest results (AUC = 0.89, accuracy = 0.81, specificity = 0.94), significantly outperforming any single sequence. AC tumors showed significantly higher texture heterogeneity than SCC (P < 0.05).

Multimodal AI for colposcopy: Miyagi et al. (2020) built a CNN model that combines colposcopic image features with HPV typing data to classify squamous intraepithelial lesions (SILs). In a study of 253 biopsy-confirmed patients (210 HSIL, 43 LSIL), the AI achieved an overall accuracy of 0.941 versus 0.843 for physicians, with sensitivity of 0.956, specificity of 0.833, and AUC of 0.963 for differentiating HSIL from LSIL. The positive predictive value was 0.977, confirming that multimodal AI enhances the accuracy of cervical lesion classification beyond what clinicians achieve alone.

TL;DR: Multi-parameter MRI radiomics distinguished cervical cancer subtypes (SCC vs. AC) with AUC = 0.89 and specificity = 0.94 using five combined sequences. A CNN integrating colposcopy images with HPV typing data reached 94.1% accuracy and AUC = 0.963 for differentiating high-grade from low-grade lesions, outperforming physicians (84.3%).

Cervical Cancer Treatment & Prognosis

Pages 6-7

AI in Cervical Cancer Radiotherapy Planning and Prognosis Prediction

Radiotherapy organ segmentation: For cervical cancer patients who are not surgical candidates, radiotherapy is the primary treatment. Mohammadi et al. (2021) developed a deep CNN based on an improved ResU-Net architecture for automatic delineation of organs at risk (OARs) during high-dose-rate brachytherapy. Analyzing imaging data from 113 patients with locally advanced cervical cancer, the model achieved Dice coefficients of 95.7% for bladder, 96.6% for rectum, and 92.2% for sigmoid colon, with Hausdorff distances of 4.05 mm, 1.96 mm, and 3.15 mm respectively. Zhang et al. (2020) developed a 3D CNN for automated brachytherapy plan processing, achieving precise segmentation of clinical target volumes from CT images of 91 patients.

Attention-gated brachytherapy: Wang et al. (2023) proposed a 3D CNN with an attention gating (AG) mechanism to improve digitalization accuracy of interstitial needles in brachytherapy. By analyzing CT data from 56 cases across 17 patients, the CNN+AG model achieved a Dice similarity coefficient of 94% (a 5% improvement over traditional CNN). Tip and shaft positioning errors were 1.1 mm and 1.8 mm respectively, representing over 45% improvement in accuracy, with HR-CTV D90 dose deviation of only 0.4%.

Lymph node metastasis and prognosis: Prognosis in cervical cancer depends critically on metastatic lymph node status. Chen et al. (2013) analyzed 588 cervical cancer patients who underwent radical hysterectomy and pelvic lymphadenectomy, establishing an RPL (ratio of positive lymph nodes) cutoff of 10%, with corresponding 5-year survival rates of 42.9% vs. 11.8%. Xia et al. (2022) developed a radiomics-nomogram model that achieved accuracy of 95.9% and AUC of 0.988 in the training cohort for predicting pelvic lymph node metastasis. Jajodia et al. (2021) combined radiomics with ADC values from diffusion-weighted MRI in 52 patients to predict recurrence (AUC = 0.80) and metastasis (AUC = 0.84), outperforming ADC alone by 40% and 2% in AUC respectively.

TL;DR: ResU-Net achieved Dice coefficients of 92-97% for organ-at-risk segmentation in brachytherapy. Attention-gated CNN improved needle positioning accuracy by 45% with only 0.4% dose deviation. Radiomics-nomogram models predicted lymph node metastasis with AUC = 0.988, and combined radiomics/ADC models predicted recurrence (AUC = 0.80) and metastasis (AUC = 0.84).

Endometrial Cancer Screening

Pages 7-8

AI for Early Screening and Diagnosis of Endometrial Cancer

Endometrial cancer is one of the most common tumors in women, with approximately 420,000 new cases and 97,000 deaths worldwide in 2022. Onset age ranges mostly between 50 and 60, with 90% of patients experiencing postmenopausal bleeding. The Cancer Genome Atlas (TCGA) classifies EC into four molecular subtypes: POLE hypermutated, microsatellite instability high (MSI-H), low copy number (CNV-L), and high copy number (CNV-H), each with distinct prognostic implications.

Ultrasound AI: Capasso et al. (2024) developed a DL model to optimize transvaginal ultrasound (TVS) diagnosis for endometrial lesions in 302 postmenopausal bleeding patients (153 EC, 149 atypical hyperplasia cases). The model achieved good automatic segmentation consistency (Dice coefficient = 0.79), with AUC-ROC of 0.90 on the validation set and 0.88 on the test set, maintaining sensitivity and specificity at 0.86-0.87. Hysteroscopy AI: Takahashi et al. (2021) integrated three CNNs (Xception, MobileNetV2, EfficientNetB0) to analyze hysteroscopic images from 177 patients across five endometrial conditions. The combined model reached 90.29% accuracy (vs. 80% for traditional methods), with sensitivity of 91.66% and specificity of 89.36%.

Serum metabolomics: Troisi et al. (2020) conducted a multicenter prospective study on 1,430 postmenopausal women, using high-performance liquid chromatography-mass spectrometry to analyze serum metabolites. Using SVM and RF algorithms following STARD reporting guidelines, the ML model achieved 99.86% accuracy in screening EC, with both sensitivity and specificity exceeding 99%. This confirmed that serum metabolomics combined with ML can serve as an effective supplementary screening method, especially for high-risk populations.

Molecular subtype prediction: Hong et al. (2021) used the Panoptes multi-resolution DL architecture to analyze 456 H&E-stained sections. The model achieved an AUC of 0.969 for differentiating endometrioid and serous subtypes, and AUCs of 0.934, 0.889, and 0.827 for predicting CNV-H, CNV-L, and MSI-H molecular subtypes respectively. It could also identify characteristic gene mutations (TP53, PTEN, FAT1, ZFHX3) with AUCs ranging from 0.781 to 0.873, potentially replacing costly genomic sequencing for routine molecular subtyping.

TL;DR: AI for EC screening ranged from ultrasound (AUC = 0.88-0.90) to hysteroscopy (90.3% accuracy with three-CNN ensemble) to serum metabolomics (99.86% accuracy on 1,430 women). The Panoptes DL model predicted TCGA molecular subtypes from H&E slides with AUCs of 0.827-0.969, potentially eliminating the need for expensive genomic tests.

Endometrial Cancer Treatment & Prognosis

Pages 8-9

Robot-Assisted Surgery and AI-Driven Prognosis for Endometrial Cancer

Robot-assisted surgery (RAS): Multiple studies confirmed the safety and effectiveness of RAS for EC. Cardenas-Goicoechea et al. (2010, 2014) evaluated 275 and then 415 EC patients, finding that RAS produced significantly less blood loss than traditional laparoscopy (P < 0.05) with equivalent 3-year overall survival (93.3% vs. 93.6%) and disease-free survival (83.3% vs. 88.4%), with similar recurrence rates (14.8% vs. 12.1%). Argenta et al. (2022) compared RAS (n=55), laparoscopic (n=40), and open surgery (n=80) for stage IA EC, showing that RAS had the shortest operation time (P < 0.05), the lowest complication rate (Clavien-Dindo grade 1 or higher, P = 0.02), and a learning curve showing significant time reduction after just 10 cases. Lowe et al. (2010) demonstrated that 96% of elderly patients (aged 80-95) successfully completed RAS with only a 7.4% complication rate and 80% discharged within two days.

Myometrial invasion assessment: Chen et al. (2020) developed a DL-based MRI analysis system using T2WI data from 530 patients (99 deep invasion, 431 superficial) to assess myometrial invasion depth. The model achieved 77.14%-86.67% accuracy for lesion area identification, with sensitivity of 66.6%, specificity of 87.5%, and accuracy of 84.8% for invasion depth judgment. The negative predictive value was 94.6%, outperforming routine radiologist diagnosis. Xiong et al. (2023) used a multi-stage DL framework combining SSD detection and Attention U-Net segmentation on MRI from 154 EC patients, achieving 86.9% accuracy (sensitivity = 81.8%, specificity = 91.7%) and 97.83% accuracy in selecting the best MRI slice.

Recurrence prediction with HECTOR: Volinsky-Fremond et al. (2024) developed HECTOR, a multimodal DL model integrating H&E whole-slide images and clinical data from 2,072 stage I-III EC patients. Using a three-arm ensemble architecture with five-fold cross-validation, HECTOR achieved C-index scores of 0.789, 0.828, and 0.815 in test sets, outperforming the current gold standard. The 10-year distant recurrence-free survival prediction for the low-risk group reached 97%. Njoku et al. (2024) used proteomic analysis of blood and cervicovaginal fluid from 118 postmenopausal patients to build a RF prediction model with overall AUC of 0.91-0.98 (sensitivity: 83%-98%, specificity: 78%-95%), identifying stage-specific markers including CNDP1 (early stage, AUC: 0.82-0.95), HPT (stage I, AUC: 0.87-0.97), and APOE (advanced stage, AUC: 0.92-1.00).

TL;DR: Robot-assisted surgery for EC matched laparoscopy in survival (93.3% vs. 93.6% at 3 years) with fewer complications and shorter recovery. DL models assessed myometrial invasion at 84.8-86.9% accuracy. The HECTOR model predicted recurrence with C-index up to 0.828 from routine pathology, and proteomic biomarkers detected EC with AUC = 0.91-0.98.

Ovarian Cancer Screening & Diagnosis

Pages 9-11

AI for Ovarian Cancer Imaging Diagnosis and Tumor Classification

Ovarian cancer is the fourth most common gynecological tumor worldwide but ranks third in mortality because over 70% of cases are not detected until advanced stages (III-IV). OC is mainly divided into epithelial ovarian cancer (EOC, over 90% of cases), non-epithelial OC, and borderline ovarian tumors (BOT). EOC itself is further classified into Type I and Type II. AI has shown particular promise in using imaging to distinguish these subtypes and separate benign from malignant tumors.

MRI-based diagnosis: Saida et al. (2022) compared DL performance with radiologists using MRI data from 194 OC/borderline patients and 271 non-malignant cases (1,798 images total). CNN showed the highest diagnostic performance on ADC maps (specificity = 0.85, sensitivity = 0.77, accuracy = 0.81, AUC = 0.89). Gao et al. (2022) developed a deep convolutional neural network (DCNN) for pelvic ultrasound assessment using 3,755 images. The DCNN achieved AUC of 0.911 in internal validation and 0.870-0.831 in two external sets, with diagnostic accuracy (81.1%-88.8%) surpassing 35 radiologists. With DCNN assistance, radiologist accuracy improved to 0.876 (P < 0.05).

OvcaFinder multimodal model: Xiang et al. (2024) developed OvcaFinder, an interpretable AI model integrating DL predictions from ultrasound images, Ovarian-Adnexal Reporting and Data System (O-RADS) scores, and clinical variables. OvcaFinder achieved AUC of 0.978 internally and 0.947 externally, reducing false positive rates by 13.4% and 8.3%. Tumor subtyping: Wang et al. (2023) used a deep supervised U-net++ on MRI from 201 patients (102 BOT, 99 EOC) and achieved AUC of 0.87 (accuracy = 83.7%) for differentiating BOT from EOC, significantly outperforming radiologists' AUC of 0.75 (P < 0.001). Xu et al. (2022) developed a radiomics model from DWI/ADC maps achieving AUC = 0.915 for BOT vs. malignant EOT classification and AUC = 0.905 for Type I/II distinction.

CT radiomics and automated segmentation: Li et al. (2021) developed 2D radiomics from CT images of 134 ovarian tumor patients, achieving AUC = 0.88 in training and 0.87 in testing, while a nomogram integrating clinical parameters reached AUC = 0.95-0.96. Wang et al. (2023) compared four DL architectures for automated CT segmentation of OC across 367 patients, finding that 3D U-Net cascade performed best with median Dice score of 0.941, Jaccard index of 0.890, sensitivity of 0.973, and 85% stability of radiomics features.

TL;DR: OvcaFinder achieved AUC = 0.978 for OC diagnosis by combining ultrasound, O-RADS scores, and clinical data. DCNN surpassed 35 radiologists in ultrasound-based OC diagnosis (AUC = 0.911). DL on MRI distinguished borderline from malignant tumors with AUC = 0.87, and 3D U-Net cascade achieved 0.941 Dice score for automated CT segmentation across 367 patients.

Ovarian Cancer Treatment & Prognosis

Pages 11-12

AI for Ovarian Cancer Treatment Guidance and Recurrence Prediction

Bevacizumab treatment prediction: Wang et al. (2022) developed the AIM2-DL model, a weakly supervised DL method to guide OC treatment. Using immunohistochemical tissue samples (AIM2, C3, C5, NLRP3) from EOC and primary peritoneal cancer patients, the model achieved accuracy of 0.92, recall of 0.97, F-score of 0.93, and AUC of 0.97 in the initial experiment. In five-fold cross-validation, it maintained accuracy of 0.86 and AUC of 0.91. Kaplan-Meier and Cox regression analyses confirmed the model could distinguish patients with low recurrence from those with disease progression, enabling personalized treatment selection.

Platinum-based therapy prediction: Ahn et al. (2024) developed PathoRiCH, a pathological risk classifier trained on an internal cohort (n=394) and validated on two external cohorts (n=284 and n=136) to predict platinum drug response in HGSOC. PathoRiCH significantly distinguished favorable from unfavorable response groups in terms of platinum-free interval. By combining molecular biomarkers, it enhanced risk stratification accuracy and used visualization and transcriptome analysis to explain its decision-making process. Wang et al. (2022) applied weakly supervised DL to H&E whole-slide images for predicting bevacizumab treatment effect, achieving accuracy of 0.882, precision of 0.921, recall of 0.912, and F-score of 0.917 in cross-validation. Patients predicted as non-responsive had a hazard ratio of 13.727 for cancer recurrence.

Recurrence and survival prediction: Wang et al. (2019) developed a DL-CPH model using 8,917 CT images from 245 HGSOC patients to predict individual recurrence risk. The concordance index was 0.713 and 0.694 in validation cohorts, with 3-year recurrence AUCs of 0.772 and 0.825. Laios et al. (2022) built an XGBoost model to predict complete cytoreductive surgery (R0 resection) in 571 advanced EOC patients, achieving AUC = 0.866 and identifying key predictive factors including OC score, peritoneal cancer index, surgical complexity score, patient age, and tumor volume.

Drug sensitivity and molecular subtypes: Zhang et al. (2024) developed an AI-based drug sensitivity prediction system analyzing 21,937 single cells using the Beyondcell algorithm. By combining TCGA multi-omics data, they identified four patient subgroups with distinct treatment responses. They constructed a DL prognostic model with a KAN (Kolmogorov-Arnold Network) architecture and validated it across three external GEO datasets, significantly improving prognosis prediction over traditional models. Their analysis also revealed that endothelial cells resist paclitaxel, doxorubicin, and docetaxel, suggesting potential therapeutic targets.

TL;DR: The AIM2-DL model predicted bevacizumab response with AUC = 0.97, and non-responsive patients had 13.7x higher recurrence risk. PathoRiCH predicted platinum therapy response across 814 HGSOC patients. DL-CPH predicted 3-year recurrence with AUC = 0.772-0.825, and XGBoost predicted R0 resection feasibility at AUC = 0.866. KAN-architecture models improved drug sensitivity prediction using single-cell analysis of 21,937 cells.

Limitations & Future Directions

Pages 13-14

Challenges, Validity Threats, and the Road Ahead

Selection bias: Many AI models in gynecological oncology are trained on non-representative datasets that may lack diversity in demographic characteristics, tumor subtypes, or imaging modalities. The authors cite the well-known example from dermatology where a deep neural network for skin cancer classification performed significantly worse on darker-skinned populations because the training data underrepresented African and Asian patients. Similar risks apply in gynecological AI, where models trained predominantly on one population may fail to generalize across different ethnic groups, healthcare settings, or imaging equipment.

Annotation inconsistency and concept drift: AI training requires high-quality labeled data, but annotation is performed manually by physicians whose subjective differences can introduce bias. Inconsistency among pathologists in grading endometrial cancer can propagate errors into AI training labels, and mislabeled data (such as tagging benign lesions as malignant) teaches models incorrect patterns. Beyond static label quality, "concept drift" poses an ongoing threat: as medical technology and diagnostic criteria evolve, models become outdated. For example, after 2020, ground-glass density patterns in chest imaging shifted from pneumonia to COVID-19 labels, illustrating how changing clinical context can degrade model performance.

Confounding variables and oversimplified evaluation: Unconsidered factors such as comorbidities, hormone therapy, or imaging artifacts can distort AI predictions. The review notes that conditions like endometriosis can show imaging features that AI might misinterpret as malignant. Additionally, current evaluation methods rely too heavily on overall performance metrics like AUC while neglecting subgroup analyses (such as stratification by disease stage) and clinical utility assessments (such as false positive rates in biopsy recommendations). The authors advocate for task-specific evaluation criteria and more nuanced assessments of AI performance in real clinical settings.

Ethics, regulation, and future directions: Data privacy, algorithmic bias, and health equity remain urgent problems. Clinical translation faces regulatory barriers from the FDA and EU CE certification. The review concludes that future development should move toward greater intelligence, personalization, and standardization. Key priorities include multi-center collaboration for standardized datasets, federated learning for privacy-preserving cross-institutional data sharing, unified evaluation frameworks for fairness and safety, and evolving the AI-doctor relationship from "auxiliary tool" to "intelligent partner." The authors emphasize that multimodal data fusion, real-time dynamic monitoring, and deeper interdisciplinary collaboration will drive the next generation of AI tools in gynecological oncology.

TL;DR: Key limitations include selection bias from non-representative training data, annotation inconsistency among pathologists, concept drift as diagnostic criteria evolve, and oversimplified evaluation metrics. The path forward requires multi-center standardized datasets, federated learning for data privacy, unified regulatory frameworks, and a shift from single-modality AI to multimodal data fusion integrating imaging, genomics, and clinical information.

Research progress of artificial intelligence in the early screening, diagnosis, precise treatment and prognosis prediction of three central gynecological malignancies

Original Paper (PDF)