ML-Driven Survey of Prostate Cancer Literature

Plain-English Explanations

Overview and Background

Pages 1-2

Why a Bibliometric Analysis of ML in Prostate Cancer?

Prostate cancer (PCa) accounts for roughly 14.1% of all male cancers worldwide, and PCa-specific deaths represent about 7% of global cancer mortality. Despite the widespread adoption of PSA screening since the 1990s and improvements in surgery and radiotherapy, current biomarkers still suffer from limited specificity, contributing to overdiagnosis rates of 30% to 40%. High-risk subtypes such as castration-resistant prostate cancer carry five-year survival rates below 30%, underscoring the urgent need for better diagnostic and therapeutic tools.

Machine learning (ML) has emerged as a promising force across multiple facets of PCa management. In imaging, MRI-based radiomics models and deep learning algorithms have facilitated automated Gleason grading and early tumor detection. In genomics, multimodal ML approaches mine complex gene-expression profiles and exosomal signatures for novel biomarkers. At the therapeutic level, deep neural networks predict patient outcomes under varying treatment regimens, guiding personalized medication strategies and radiotherapy planning.

However, concerns persist about reproducibility, external validation, and clinical utility. Among AI systems benchmarked against clinicians, only a minority have been prospectively tested or deployed in real-world settings, and adherence to reporting guidelines such as CONSORT-AI remains inconsistent. This study set out to systematically map the evolution, research hotspots, and collaborative landscape of ML-PCa research using bibliometric methods, while also critically examining whether translational gaps persist between algorithmic development and clinical implementation.

TL;DR: PCa affects 14.1% of male cancers globally, with overdiagnosis at 30-40% and castration-resistant subtypes showing less than 30% five-year survival. ML has shown promise in imaging, genomics, and treatment planning, but very few models reach prospective clinical testing. This bibliometric review maps the field's growth and identifies where translation has stalled.

Methodology

Pages 2-3

Search Strategy and Bibliometric Toolchain

The authors conducted a systematic literature search on July 5, 2025, across the Web of Science (WOS) Core Collection and Scopus databases. The WOS query used TS=("machine learn*") combined with prostate cancer terms (including "prostate carcinoma," "prostatic neoplasm," "castration-resistant prostate cancer," and "metastatic prostate cancer"). Scopus used a TITLE-ABS-KEY equivalent. Both searches were restricted to English-language articles and reviews published between January 2005 and December 2024. Scopus results were further filtered by subject categories including Medicine, Biochemistry/Genetics, Computer Science, and Health Professions.

After merging results from both databases using Python (version 3.9.14), duplicates were removed and records with incomplete information were excluded. The final corpus comprised 2,632 publications. The analysis employed a tripartite toolchain: CiteSpace 6.4.R1 for citation burst and temporal trend detection (configured with 1-year time slices, g-index term selection at k=25, and Pathfinder network pruning at g=0.7); VOSviewer 1.6.20 for co-authorship and keyword co-occurrence networks (minimum threshold of 5 documents per node, full counting with association strength normalization); and R-bibliometrix (version 4.1.0) for Latent Dirichlet Allocation topic modeling (10 topics via Gibbs sampling over 2,000 iterations with exponential smoothing at a=0.8).

This combination of tools ensured multidimensional coverage, from network visualization and burst detection to statistical validation and thematic evolution tracking. Importantly, the authors also examined proportional signals related to clinical validation within the corpus, finding that the keyword "validation" appeared in only approximately 2.8% of articles (73 out of 2,632), while terms reflecting prospective evaluation, randomization, or real-world implementation were entirely absent from the most frequent author keywords.

TL;DR: 2,632 publications were retrieved from WOS and Scopus (2005-2024) and analyzed using CiteSpace, VOSviewer, and R-bibliometrix. The keyword "validation" appeared in just 2.8% of the corpus, and prospective/real-world implementation terms were absent from top keywords, signaling a major translational gap.

Publication Trends

Pages 3-5

Explosive Growth in ML-PCa Research Output

Between 2005 and 2024, the field underwent dramatic expansion. Annual publications remained below 20 throughout 2005-2014, but accelerated growth began in 2015, surpassing 100 articles by 2018 and reaching 206 in 2020. The period 2021-2024 saw exponential expansion: 472 publications in 2022, 559 in 2023, and 661 in 2024. The average annual growth rate was 25.4% over two decades, with a remarkable 82% of all publications (2,173 out of 2,632) concentrated in just the final four years. Cumulative citations reached 57,771 across the full period.

Citation dynamics: Annual citations fluctuated between 400 and 800 during 2005-2013, with a transient 2009 spike of 3,478 citations attributable to highly influential early works. From 2014 onward, citations grew robustly, surpassing 4,000 in 2018 and surging to 6,809 in 2019. The peak citation year was 2022 at 7,989 annual citations. A moderate decline to approximately 3,700 citations annually in 2023-2024 may reflect preliminary saturation in algorithm-focused studies and the natural lag before newer papers accumulate citations.

Country contributions: Researchers from 92 countries contributed to ML-PCa research. China led with 649 publications (24.66%), followed by the United States at 492 (18.69%), India at 162 (6.16%), the United Kingdom at 109 (4.14%), and Canada at 108 (4.10%). While China maintained a relatively low multinational collaboration proportion (MCP ratio of 13.56%), the US exhibited higher collaborative engagement (MCP ratio of 20.93%). Germany demonstrated the most extensive international integration at 39.29% MCP ratio. The US-China partnership was the strongest bilateral collaboration, with 62 joint publications.

TL;DR: ML-PCa publications exploded from fewer than 20 per year (2005-2014) to 661 in 2024, with 82% of all output concentrated in 2021-2024. Cumulative citations reached 57,771. China (649 papers) and the US (492 papers) dominate, while Germany leads in international collaboration at 39.29% MCP ratio.

Institutional and Journal Landscape

Pages 5-7

Who Produces the Most Impactful Work?

Institutional output vs. impact: The research community spans 3,638 unique organizations. The Chinese Academy of Sciences leads in volume with 42 publications, followed by the University of British Columbia (32) and Shanghai Jiao Tong University (25). However, citation impact tells a different story. The University of Toronto achieved the highest average citations per paper (CPP) at 70.54, followed by Case Western Reserve University at 69.86 and the University of Pennsylvania at 59.83. The Chinese Academy of Sciences, despite leading in volume, achieved a CPP of only 38.31. Western institutions, particularly the University of British Columbia (highest centrality at 0.06), Stanford, and the University of Pennsylvania, function as primary global connectors with extensive international linkages.

Research specialization: North American and European institutions demonstrate concentrated expertise in digital pathology and AI applications, while Chinese counterparts show a heightened focus on cell-free DNA analysis and medical imaging diagnostics. This geographic alignment in specialization creates opportunities for complementary collaboration, integrating Chinese computational efficiency with Western clinical-validation pipelines.

Journal distribution: The field's 10,437 unique journal outlets are led by Cancers (82 publications, 1,127 total citations), Frontiers in Oncology (70 publications), and Scientific Reports (52 publications). Medical Physics stands out for superior per-article influence with a CPP of 27.79 despite moderate output volume. Thematic clustering shows Cancers and Frontiers in Oncology anchoring oncology-focused research, while Sensors and IEEE Access concentrate on computational modeling. Medical Physics serves as a critical interdisciplinary hub bridging clinical and technological clusters.

TL;DR: The Chinese Academy of Sciences leads in volume (42 papers) but the University of Toronto leads in citation impact (CPP: 70.54). Cancers is the top journal (82 papers, 1,127 citations), while Medical Physics has the highest per-article influence (CPP: 27.79). North American/European institutions specialize in digital pathology and AI, while Chinese institutions focus on imaging diagnostics and cell-free DNA.

Author Networks and Keyword Evolution

Pages 8-10

Key Researchers and the Shift from SVMs to Transformers

Author productivity: The global ML-PCa community comprises 12,345 authors, with 23.21% of publications involving international collaborations. Madabhushi Anant emerged as the foremost contributor with 19 publications, 1,475 total citations, an average of 77.63 citations per paper, and an H-index of 16. He occupies the central hub of the global collaboration network, maintaining robust ties with Abolmaesumi Purang, Comelli Albert, and Cacciamani Giovanni E. Other high-impact authors include Shiradkar Rakesh (average citations of 45.13) and Cuocolo Renato (47.50), who demonstrate significant influence through focused, high-quality output.

Co-citation foundations: Leo Breiman emerged as the most influential cited author (373 co-citations, centrality of 0.09), underscoring the foundational role of his Random Forests algorithm. Siegel Rebecca L. (327 citations) anchors epidemiological foundations, while Litjens Geert's bridging centrality (0.13, despite only 202 citations) confirms his role in cross-domain knowledge integration between computational and clinical research.

Keyword evolution: The field underwent a clear paradigm shift across three phases. Pre-2015 research emphasized conventional techniques: "pattern recognition," "algorithm," "classification," and "support vector machine." During 2015-2020, focus shifted toward integrated approaches: "radiomics," "multiparametric MRI," and "deep learning." Post-2020 innovations feature advanced architectures: "transformer models," "attention mechanisms," and "radiogenomics." Burst detection analysis reinforces this trajectory, with algorithm-centric bursts (2005-2018) giving way to clinical implementation keywords like "radiotherapy dosage" and "radiology" (2020-2024).

Top keywords by frequency: "Machine learning" dominated with 2,388 occurrences, followed by "prostate cancer" (1,726), "prediction" (368), "diagnosis" (363), and "classification" (278). Critically, "validation" appeared only 73 times (centrality of 0.07), while "survival" (centrality of 0.08) and "biopsy" (centrality of 0.06) served as important bridging terms between technical and clinical domains.

TL;DR: Madabhushi Anant leads with 19 papers and 77.63 average citations per paper. The field evolved from SVMs and feature engineering (pre-2015) through radiomics and deep learning (2015-2020) to transformers and radiogenomics (post-2020). "Validation" appeared only 73 times out of 2,632 publications, confirming the gap between algorithmic development and clinical testing.

Knowledge Foundations

Pages 11-14

Seminal Works and Thematic Clusters Shaping the Field

Most influential papers: Bera et al. (2019) in Nature Reviews Clinical Oncology leads with 880 local citations, establishing AI's transformative role in digital pathology and precision oncology. Choy et al. (2018) in Radiology (532 citations) covers ML applications in radiology, while Van der Laak et al. (2021) in Nature Medicine (498 citations) charts deep learning's path to clinical histopathology. Key prostate-specific works include Litjens et al. (2014) on computer-aided detection in MRI (362 citations) and Fehr et al. (2015) on automated Gleason scoring from multiparametric MRI (303 citations). Elmarakeby et al. (2021) in Nature (225 citations) demonstrated biologically informed deep neural networks for prostate cancer discovery.

Five thematic clusters: Co-citation analysis revealed five interconnected thematic pillars. The red cluster anchors algorithmic foundations, dominated by Pedregosa et al.'s Scikit-learn framework (126 co-citations) and Chen et al.'s XGBoost model (75 co-citations), establishing Python-based ML workflows as standard. The blue cluster encompasses clinical and imaging standardization, integrating GLOBOCAN epidemiology with PI-RADS validation studies (PI-RADS v2/v2.1 by Weinreb et al. and Turkbey et al.). The purple cluster develops radiomics frameworks, exemplified by Gillies et al. (2016) positioning medical images as mineable data sources (81 co-citations). The green cluster supports multi-omics integration through tools like GSEA and Limma. The yellow cluster traces clinical translation pathways connecting algorithm development to diagnostic applications.

Citation bursts and clinical momentum: Early bursts (pre-2018) featured radiomics methods and PI-RADS standardization. The 2018-2020 inflection point saw deep learning methodologies (LeCun et al., 2015; XGBoost) gain prominence. Later surges prioritize clinical-AI integration: Sung et al.'s (2021) epidemiology burst reached an intensity of 17.36 during 2022-2024, while Kasivisvanathan et al.'s (2018) validation of MRI-targeted biopsies in the PRECISION trial reflects growing emphasis on translational evidence.

TL;DR: Bera et al. (2019) leads with 880 local citations. Five thematic clusters span algorithmic foundations (Scikit-learn, XGBoost), clinical imaging standards (PI-RADS), radiomics frameworks, multi-omics integration, and clinical translation pathways. Citation bursts shifted from radiomics methods (pre-2018) to clinical-AI integration (post-2020), with Sung et al.'s epidemiology burst reaching intensity 17.36 in 2022-2024.

Research Hotspots

Pages 17-18

Multimodal MRI, CNN Feature Engineering, and Public Datasets

Multimodal MRI deep learning: Current research increasingly leverages multiparametric MRI (mpMRI)-based ML for detection, grading, and characterization of clinically significant prostate cancer (csPCa). Deep learning models now integrate T2-weighted, diffusion-weighted imaging (DWI), and dynamic contrast-enhanced (DCE) sequences. The Deep Radiomics model, trained on 615 patients across four cohorts (PROSTATEx, Prostate158, PCaMAP, and NTNU/St. Olavs Hospital), achieved a patient-level AUROC of 0.91 in independent testing, comparable to PI-RADS assessment (AUROC: 0.94) without significant difference. An MRI-TRUS fusion 3D-UNet model tested on 3,110 patients showed superior sensitivity (80% vs. 73%) and lesion Dice coefficient (42% vs. 30%) over MRI-alone approaches, with higher specificity (88% vs. 78%) in 110 controls.

CNN imaging innovations: CNNs have outperformed traditional handcrafted feature methods for prostate imaging. Lightweight 3D-CNN variants like XmasNet (ResNet-based blocks with transfer learning) achieved an AUC of 0.84 using 199 training and 200 test cases from PROSTATEx. Automated segmentation via nnU-Net followed by voxel-wise radiomics feature extraction and XGBoost classification balances interpretability and efficacy. Channel and spatial attention mechanisms that weight multiscale features have improved tumor boundary delineation and heterogeneity detection, increasing sensitivity by more than 5%.

Public datasets and challenges: Multicenter public datasets address single-institution limitations. The SPIE-AAPM-NCI PROSTATEx challenge provided 330 training and 208 testing lesions with standardized mpMRI quality control, while PROSTATEx-2 focused on Gleason grade prediction. The TCIA Prostate-MRI-US-Biopsy dataset (1,151 patients) has been validated in over 17 core publications. The MVP-CHAMPION project integrates clinical, genomic, and imaging data within the Million Veteran Program for closed-loop ML model refinement. Open-science platforms like Grand-Challenge.org share preprocessing scripts, model code, and visualization tools to establish reproducible community standards.

TL;DR: Deep Radiomics achieved AUROC 0.91 across four cohorts (615 patients), comparable to PI-RADS (0.94). MRI-TRUS fusion 3D-UNet on 3,110 patients showed 80% sensitivity vs. 73% for MRI alone. XmasNet reached AUC 0.84 on PROSTATEx. Key public datasets (PROSTATEx, TCIA with 1,151 patients, MVP-CHAMPION) and open-science platforms are accelerating validation and reproducibility.

Limitations and Future Directions

Pages 14-15, 18-19

The Translational Gap and What Must Change

Persistent translational disconnect: Despite the explosive growth in publications and methodological sophistication, clinical validation remains the exception. Within the corpus, the keyword "validation" appeared in only 2.8% of articles, and terms like "prospective," "randomized," "trial," and "implementation" were entirely absent from the top 25 keywords. Fewer than 3% of highly cited works address ethical governance, health economics, or regulatory science. External evidence corroborates this pattern: among 81 non-randomized deep learning imaging studies comparing AI with clinicians, only 9 were prospective and just 6 were tested in clinical settings. Among FDA- or CE-marked algorithms, only 28% satisfied even half of the Dutch AIPA guideline's evidence criteria.

Geographic and data bias: An estimated 78% of training data comes from Western cohorts, creating geographic bias that hampers real-world deployment across diverse populations. China's low multinational collaboration proportion (13.56% MCP) compared to Germany's 39.29% signals risks of knowledge siloing and Western-centric clinical standards versus Eastern imaging focus. Resource asymmetry between institutions and regions further limits equitable implementation.

Study-specific limitations: This bibliometric review itself relies on WOS and Scopus data, which may miss literature indexed only in other databases. The analysis evaluates research trends through citation metrics and keyword frequencies rather than directly assessing methodological quality or clinical applicability of individual studies. Language bias (English-only inclusion) and self-citation patterns may also introduce errors.

Roadmap for closing the gap: The authors identify several critical priorities. First, multi-institutional validation studies using standardized imaging biomarkers are essential. Second, Federated Learning solutions are needed for data-scarce populations, enabling collaborative model training without centralizing sensitive patient data. Third, SNOMED-CT integration should bridge electronic health record silos. Emerging initiatives like the PI-CAI challenge and consortium have begun to address cross-population validation and methodological standardization. Additionally, the development of explainable AI systems remains critical for clinical trust, and future success depends on nurturing researchers fluent in both biomedicine and algorithmic design.

TL;DR: Only 2.8% of ML-PCa papers mention "validation," and prospective trial terms are absent from top keywords. 78% of training data comes from Western cohorts, creating geographic bias. Only 28% of FDA/CE-marked AI tools meet half of AIPA evidence criteria. Closing the gap requires multicenter validation, Federated Learning for underserved populations, SNOMED-CT integration, and explainable AI development.

Algorithms on the rise: a machine learning-driven survey of prostate cancer literature

Original Paper (PDF)