Prostate cancer (PCa) is the third most commonly diagnosed cancer worldwide and the fifth leading cause of cancer-specific death in males. In the United States alone, approximately 191,930 patients were expected to be diagnosed in 2020, with an estimated 33,330 deaths. The sheer volume of cases, combined with the complexity of diagnosis, treatment planning, and prognosis, makes prostate cancer an ideal candidate for artificial intelligence and machine learning approaches.
Scope of the review: This paper surveys five years of published research on AI and ML in prostate cancer, drawing from PubMed, Web of Science, and Science Direct. The authors searched using keywords including prostate cancer, biomarker, genomics, artificial intelligence, and artificial neural networks. The review covers the full clinical pathway, from digital pathology and diagnostic imaging through genomics, radiotherapy planning, and robotic surgery.
Key AI concepts: The authors provide a structured glossary distinguishing AI, machine learning (ML), deep learning (DL), convolutional neural networks (CNNs), and artificial neural networks (ANNs). DL is a subset of ML that learns hierarchical features from data without requiring hand-engineered inputs. Deep convolutional neural networks (DCNNs) have proven especially effective for digitized image analysis. Clinical decision support systems (CDSSs) built on these techniques are under active development, though reviews at the time of publication gave limited evidence for their direct clinical application in PCa oncological care.
The promise: AI tools offer potential across multiple fronts. In pathology, they can analyze larger datasets and deliver faster, more accurate diagnoses of prostate cancer lesions. In imaging, they have shown excellent accuracy in lesion detection and outcome prediction. In genomics, they provide the computational power needed to process the enormous datasets generated by tumor genome sequencing. In radiotherapy, they may predict treatment toxicity and optimize dose planning. In surgery, they could enable more autonomous robotic tasks.
Optical image analysis and tissue differentiation: Digital whole-slide imaging has transformed pathology by enabling slide preservation, telepathology, and ML-assisted analysis with greater precision than traditional microscopy. Kwak et al. used five different methods on tissue microarrays (TMAs) at the National Institutes of Health, finding that multi-view boosting classification achieved an AUC of 0.98 (95% CI 0.97-0.99) for differentiating cancer from benign tissue, significantly outperforming single-view approaches (p < 0.01). Arvaniti et al. trained a patch-based classifier on TMAs from 641 patients, achieving inter-observer agreement (kappa = 0.71 and 0.75) comparable to the agreement between two ground-truth pathologists (kappa = 0.71).
Whole slide image analysis at scale: Several landmark studies pushed the boundaries of automated cancer detection. Campanella et al. evaluated a multiple-instance learning framework on 24,859 whole slide images, achieving an AUC of 0.989 at 20x magnification, with automated removal of 75% of slides causing no loss in sensitivity. In a separate study using 12,160 slides, the same group trained AlexNet and ResNet18 networks, with the best models (ResNet34, VGG11-BN) achieving AUCs of 0.976 and 0.977 respectively. Litjens et al. trained a deep learning CNN on 225 glass slides, reaching an AUC of 0.98 for slide-level cancer detection.
Gleason grading performance: Nagpal et al. trained a DL system on 1,557 slides and compared it to 29 pathology experts (mean accuracy 0.61). The model achieved a significantly higher accuracy of 0.70 (p = 0.002). Bulten et al. analyzed 5,759 core biopsies from 1,243 patients, with their deep learning system achieving an AUC of 0.990 for malignancy determination, outperforming 10 of 15 pathologists. For Grade group 2 or higher, the system reached an AUC of 0.978 (0.966-0.988), and for Grade group 3 or higher, an AUC of 0.974 (0.962-0.984).
AI-assisted pathologists: Raciti et al. demonstrated the impact of Paige Prostate Alpha on pathologist performance. Average sensitivity without the AI system was 74% (+/- 11%), rising to 90% (+/- 4%) with AI assistance. Sensitivity improvements were most pronounced for lower Gleason grade groups: a 20% increase for Grade group 1, 13% for Grade group 2, and 11% for Grade group 3. Roffman et al. developed a multiparametric ANN for risk prediction based on clinical and demographic characteristics, yielding high specificity (89.4%) but low sensitivity (23.2%). Lenain et al. used SVM, random forest, and extreme gradient boosting on histopathological data from 4,470 patients, achieving F1-Scores of 0.98 for N-staging and 0.99 for M-staging classification.
MRI-based approaches: AI-assisted MRI analysis has evolved significantly since early attempts at pathology-imaging fusion in 1991. Gaur et al. demonstrated in a multi-institutional study that AI-based detection improved specificity when combined with PI-RADS v2 categorization, with particular benefit in the transitional zone (TZ), where sensitivity rose from 66.9% with MRI alone to 83.8% with automated detection. Abdollahi et al. found that radiomics-based models on T2-weighted images were more reliable than apparent diffusion coefficient (ADC) for Gleason score prediction, with T2W models achieving a mean AUC of 0.739 compared to 0.70 for ADC models. For staging, ADC models showed higher predictive performance (mean AUC 0.675).
CNN architectures for MRI: Aldoj et al. developed a CNN using 3D combinations of ADC, DWI, and T2-weighted images, achieving an AUC of 0.91 at 81.2% sensitivity and 90.5% specificity. The PROSTATEx Challenges drove further development: de Vente et al. showed that soft-label ordinal regression with deep learning improved PCa grading from biparametric MRI. Sunoqrot et al. proposed an automatic signal intensity normalization approach that significantly improved classification of peripheral zone tissues (AUC 0.826 vs. 0.769, p < 0.001) and transition zone tissues (AUC 0.743 vs. 0.678).
TRUS-based imaging: For transrectal ultrasonography, Feng et al. proposed a method extracting features from spatial and temporal dimensions via 3D convolutions, achieving sensitivity of 82.98 (+/- 6.23), specificity of 91.45 (+/- 6.75), and accuracy of 90.18 (+/- 6.62) in PCa detection using contrast-enhanced ultrasonography (CEUS). Wildeboer et al. assessed ML combining B-mode, shear-wave elastography, and dynamic contrast-enhanced ultrasound, achieving an AUC of 0.90 for PCa and Gleason greater than 3 + 4.
MRI-TRUS fusion: Combined modality approaches are essential for fusion biopsy guidance. Anas et al. proposed an automatic prostate segmentation technique incorporating temporal TRUS information for real-time deformable registration. Hu et al. developed a CNN for registering T2-weighted MRI and 3D TRUS volumes, achieving a median target registration error of 3.6 mm and a median Dice of 0.87. Liu et al. showed that logistic regression based on dynamic contrast-enhanced MRI phases could predict PCa invasiveness with a noninvasive accuracy of 0.90 in patients with PI-RADS v2 scores of 4 or 5. Ishioka et al. combined a CNN with U-Net and ResNet50, achieving AUC values of 0.645 and 0.636 for estimating biopsy-targeted areas.
miRNA and SNP analysis: Bertoli et al. used a meta-analysis approach with support vector machines to identify a group of 29 miRNAs with diagnostic utility (AUC 0.989 +/- 0.016) and 7 miRNAs with prognostic potential (best AUC 74.7%, 95% CI 73.28-76.11). MacInnis et al. applied a novel method analyzing the dependency of association on the number of top hits, identifying 14 genomic regions associated with PCa. The Decipher test uses a random forest algorithm trained on 22 mRNA expression markers to predict metastatic disease, with an AUC of 0.79 for 5-year metastasis-free survival after surgery.
Gene expression and methylation: Hou et al. used a genetic algorithm-optimized artificial neural network to build a diagnostic model achieving an AUC of 0.953 for diagnosis and 0.808 for 5-year overall survival prognosis. Liu et al. identified 12 CpG site markers and 13 promoter markers from an initial pool of 139,422 CpG sites using deep neural networks combined with three ML strategies (moderated t-statistics, LASSO, and random forest), achieving 100% sensitivity for CpG markers and 92% for promoter markers. These methylation-based approaches hold potential for liquid biopsy applications.
Gene activity and tumor localization: Hamzeh et al. combined SVM with radial basis function kernel (SVM-RBF), Naive Bayes, and random forest to analyze gene activity patterns. The SVM-RBF classifier achieved 99% accuracy in separating tumors by location and identified HLA-DMB and EIF4G2 as genes correlated with PCa progression. de la Calle et al. analyzed 648 samples (424 tumors, 224 normal tissue) using tissue microarrays with anti-Ki-67 and anti-ERG antibodies through an AI algorithm, achieving 100% identification of ERG-positive tumors.
Radiotherapy-related genomics: Lee et al. used pre-conditioned random forest regression and bioinformatics tools for genome-based prediction of late toxicity after radiotherapy, producing a statistically significant model (AUC 0.70, 95% CI 0.54-0.86, p = 0.01) for weak stream prediction. While the overall evidence base for genomic AI applications in PCa remains sparse, the authors note that great computational power will be required in the future to increase receiver operating characteristics through fused multi-modal data streams.
MRI-based treatment planning: Radiomics-based detection of cancerous patches on MRI can be transferred to CT scans for external beam radiation therapy (EBRT) using texture-feature-enabled ML classifiers. Shiradkar et al. demonstrated that radiomics-based focal therapy reduced dosage compared to whole-gland treatment in both EBRT and brachytherapy. Dong et al. compared deep attention U-Net networks with and without deep attention algorithms, finding that synthetic MRI (sMRI) achieved better volume overlapping, surface matching, and center matching than CT for PCa radiotherapy treatment planning.
Organ segmentation and dose calculation: Saveje et al. investigated CNN architectures DeepMedic and V-Net for auto-segmentation of organs at risk using MRI images. DeepMedic required fewer manual adaptations and less time for the delineation procedure, making it valuable for clinical workflow optimization. Shafai-Erfani et al. used CNN and random forest algorithms to generate synthetic CT images from MRI, finding no significant differences in dose volume histograms and planning target volumes. This approach could potentially eliminate CT acquisition and its associated radiation exposure.
TRUS-based brachytherapy planning: Lei et al. developed a deeply supervised V-Net for TRUS-based prostate segmentation, achieving a Dice similarity coefficient of 0.92 (+/- 0.03) and a Hausdorff distance of 0.94 (+/- 1.55 mm). Nicolae et al. demonstrated that ML algorithms reduced brachytherapy planning time from 17.88 (+/- 8.76) minutes for expert planners to 0.84 (+/- 0.57) minutes (p = 0.020), while producing dosimetrically equivalent plans with an average prostate V150% that was 4% lower (p = 0.002).
Toxicity prediction: Isaksson et al. reviewed PCa radiotherapy outcomes in terms of genitourinary and gastrointestinal toxicity. They found that only a few screened publications showed performance superior to classical statistical models. Adding features such as statin drug use and PSA levels prior to intensity-modulated radiotherapy was strongly related to toxicity outcomes. The authors note that DL-based methods will eventually calculate radiotherapeutic doses with the accuracy and efficiency needed for real-time radiotherapy, but the field requires further validation.
Instrument detection: Sarikaya et al. proposed an end-to-end deep learning approach for instrument detection and localization in robotic-assisted surgery images. Using a CNN processing stream combined with a multimodal convolutional network, they achieved an average precision of 91% with a computation time of 0.103 seconds per frame (training time of 7.22 hours). While the model outperformed similar approaches, the authors noted that the process remained slow for real-time processing of all images.
Surgical skill assessment: Hung et al. used a da Vinci system recorder to collect automated performance data, which they fed into ML algorithms. They found that bimanual dexterity is an ideal surgical skill metric and that camera manipulation strongly correlates with surgeon expertise and good outcomes. This data-driven approach to skill assessment could standardize how surgical competency is measured and provide objective feedback during training.
Outcome prediction: Biochemical recurrence following robotic-assisted radical prostatectomy was analyzed using three supervised ML algorithms with multiple training variables. The ML techniques produced accurate disease predictability across all three models, outperforming traditional statistical regression. Goldenberg et al. developed a system using a computer-controlled TRUS transducer to track surgical instrument tips alongside real-time MRI and TRUS images, enabling visualization of suspected lesions during surgery.
Future of autonomous surgery: The complexity of fully autonomous surgical systems remains very high, requiring coverage of all aspects of a surgical procedure and the ability to transfer surgical skills toward automated execution. ML software is becoming increasingly attractive for constructing 3D models that could be integrated into augmented reality and virtual reality systems. However, the authors emphasize that automatic intraoperative image overlapping remains an area of active development, and the transition from assisted to autonomous robotic surgery will require coordinated effort between regulatory authorities and equipment manufacturers.
Small datasets and generalization: A core limitation across the field is that most AI models are trained on relatively small, single-center datasets. For broader clinical impact, models will require very large and representative datasets with standardized image acquisition. Fine-tuning and standardizing the DL process would reduce errors through generalization, but this demands significant investment in digitalization infrastructure, including hardware, software, and computational resources.
Regulatory hurdles: Every AI tool intended for clinical use must undergo regulatory approval. Both the European Union and the United States require certification beyond self-validation, including studies that prove reproducible results and mitigate non-reproducibility risk. How AI/ML models will integrate into clinical practice remains a fundamental challenge, particularly regarding how to present actionable information to physicians in a usable format.
Domain-specific limitations: In pathology, the costs of digitalizing images, demonstrating safety to pathologists, and establishing performance thresholds (where AI performs at least as well as pathologists) are significant barriers. Current algorithms for radiological imaging focus on registration, segmentation, and radiomics, but few studies have advanced beyond image processing to delivering clinical decision support. In targeted biopsy, prostate image deformation, patient movement, and poor alignment continue to cause errors, and an integrated real-time imaging and biopsy system is still needed to reduce underdiagnosis.
Cross-domain data fusion: The scarcity of studies reporting on fused data streams, where genomic, imaging, and clinical information are combined, reflects the broader challenge of integrating multi-modal data. The authors argue that future progress depends on identifying the best ML methods for handling these combined data sources, and that correlating morphometric features from histopathology with radiological methods and proteomics is a critical frontier that remains largely unexplored.
Commercially available AI tools: The authors anticipate that commercially available tools for predicting and grading PCa with AI assistance will emerge in the near future. Deep learning methods, particularly DCNNs and DNNs, are considered the state-of-the-art approach for classification in medical imaging and are the most appropriate models to be applied in histopathology. However, the authors caution that present studies cannot yet recommend AI for routine clinical pathology use.
Imaging and scoring systems: AI and deep learning techniques have the potential to reduce both inter-observer and intra-observer variability in scoring systems like PI-RADS, especially among less experienced physicians. Further research is needed to improve malignancy prediction in PCa images, to accurately guide biopsy targeting, and to properly diagnose clinically significant PCa. New MRI techniques such as luminal water imaging, restriction spectrum imaging, VERDICT (vascular, extracellular, and restricted diffusion for cytometry in tumors), and MR fingerprinting are being developed alongside AI to improve detection and characterization.
Biomarkers and liquid biopsy: The last decade has produced numerous biomarkers with potential clinical use. ANNs can play an important role in analyzing biomarkers such as Ki-67 and ERG antibodies. AI may provide faster and more reliable identification and validation of biomarkers for prostate cancer. DNA methylation markers identified through deep neural networks hold particular promise for liquid biopsy applications, offering a less invasive diagnostic pathway.
Surgery and augmented reality: ML algorithms for constructing 3D models could be integrated into augmented and virtual reality systems for surgical planning and real-time guidance. Predicting surgical movements alongside real-time organ and tumor localization is a promising research direction. The authors conclude that AI methods will need to perform at least as well as human experts in pathology, imaging, radiotherapy, and surgery, and gain approval from regulatory agencies worldwide, before achieving widespread clinical adoption.