ML Pathomics for Bladder Cancer Diagnosis and Survival

International Journal of Cancer 2021 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why Pathomics-Based Machine Learning for Bladder Cancer?

Clinical problem: Bladder cancer (BCa) is the most common malignant tumor of the urinary system and ranks fourth in incidence among male malignancies. Accurate clinical diagnosis currently depends on histopathology, where pathologists examine H&E-stained tissue sections under microscopes. However, certain histopathological patterns, such as microcystic urothelial carcinoma and papillary urothelial neoplasm of low malignant potential, can exhibit deceptive appearances that challenge even experienced pathologists. Traditional immunochemistry methods sometimes remain insufficient for difficult differential diagnoses, particularly when distinguishing BCa from benign conditions like glandular cystitis.

Pathomics approach: This study by Chen et al. from Shanghai General Hospital and Shanghai Jiao Tong University introduces a "pathomics" approach, which digitalizes cancer tissues directly through slide scanning to extract high-dimensional quantitative features from histopathological images. Unlike radiomics, which works with indirect radiation-based imaging (CT, MRI), pathomics captures information directly from tumor cells and the extracellular matrix, potentially extracting histological features to a greater extent. The authors collected a total of 643 H&E-stained BCa images from two independent sources: 108 patients from the Shanghai General Hospital (General cohort, January 2009 to December 2016) and 406 patients from The Cancer Genome Atlas (TCGA cohort).

Dual objectives: The study pursued two distinct machine learning objectives. The first was an automated diagnostic model to distinguish BCa from normal bladder tissue and from glandular cystitis. The second was a prognostic model to stratify patients by survival risk based on image features alone, ultimately integrating the resulting risk score with conventional clinicopathologic factors into a nomogram. Both models relied on the LASSO (least absolute shrinkage and selection operator) algorithm for feature selection and coefficient estimation, applied through the glmnet package in R.

TL;DR: This study constructed machine learning models from 643 H&E-stained bladder cancer images across two cohorts (108 Shanghai General Hospital patients, 406 TCGA patients) to automate BCa diagnosis and predict survival. The pathomics approach extracts quantitative features directly from tissue slides, offering potentially richer data than radiomics-based imaging.
Pages 2-3
Image Acquisition, Segmentation, and Feature Extraction Pipeline

Sample preparation: For the General cohort, all 108 formalin-fixed paraffin-embedded (FFPE) BCa samples were sliced at 5 micrometers and stained with hematoxylin and eosin. An experienced genitourinary pathologist reviewed each slide and selected the most representative image per sample based on nuclear pleomorphism, mitosis, carcinoma infiltration, cancer invasion, tumor cell differentiation, and pathological grading. Images of 1000 x 1000 pixels were acquired at 400x magnification. An additional 53 normal bladder and 39 glandular cystitis FFPE samples from Shanghai General Hospital were processed identically. For the TCGA cohort, 406 BCa and 37 normal bladder tissue H&E images were processed with Leica Aperio ImageScope at 400x magnification. All 643 images underwent final review by an independent pathologist.

CellProfiler pipeline: The authors built an automated image processing pipeline using CellProfiler (v.3.1.9), a well-established open-source tool for biological image analysis. H&E-stained images were first unmixed via the "UnmixColors" module to separate the hematoxylin and eosin stain channels. Unmixed images were then automatically segmented through the "IdentifyPrimaryObjects" module (for cell nuclei identification) and the "IdentifySecondaryObjects" module (for cell cytoplasm delineation). This automated segmentation produced visible differences between BCa tissue and normal bladder tissue in the processed images.

Feature extraction: From the segmented images, quantitative features capturing object shape, size, texture, and pixel intensity distribution were extracted using multiple CellProfiler measurement modules: "Object Intensity Distribution," "Object Intensity," "Texture," and "Object Size Shape." After eliminating redundant and uninformative features, 345 available quantitative image features were retained for downstream machine learning analysis. These features collectively formed the "pathomics signature" that served as input for both the diagnostic and prognostic models.

TL;DR: The pipeline used CellProfiler to unmix H&E stains, segment cell nuclei and cytoplasm automatically, and extract 345 quantitative image features per sample (shape, size, texture, intensity). All images were 1000 x 1000 pixels at 400x magnification, with 643 total samples across two independent cohorts.
Pages 3-4
LASSO-Based Diagnostic and Prognostic Model Design

Diagnosis model construction: For the diagnostic classifier, the 406 BCa and 37 normal bladder images from TCGA were randomly split 1:1 into a training cohort and a test cohort using a computer-generated random seed in R. The 108 BCa images and 53 normal bladder tissue images from Shanghai General Hospital served as an independent external validation cohort. The LASSO algorithm with 10-fold cross-validation was applied to the 345 quantitative features in the training cohort to select BCa-related digital factors and calculate their weighted coefficients. The resulting diagnostic score was computed as the weighted sum of selected features: Diagnostic score = sum of (Ci x Di), where Di represents a selected image feature and Ci its corresponding LASSO-derived weight.

Prognosis model construction: For the survival prediction model, the authors employed a LASSO-Cox proportional hazards model on the 406 TCGA patients (serving as the training cohort) to identify survival-related image features from the same 345 quantitative features. A machine learning-based risk score was generated using the formula: Risk score = sum of (Ci x Ri), where Ri represents a selected survival-related image feature and Ci its weight. The General cohort (108 patients) served as the external validation cohort for prognostic verification.

Statistical validation framework: The study used ROC curves with AUC values to evaluate diagnostic model performance, and Kaplan-Meier survival analysis with Cox regression (reporting hazard ratios and 95% confidence intervals) for prognostic validation. Univariate and multivariate Cox regression analyses were performed to determine whether the risk score was an independent predictor of survival. All analyses used SPSS 24.0 and R v.3.6.2, with P-values below 0.05 considered statistically significant. The cut-off between high-risk and low-risk groups was defined as the median score within each respective cohort.

TL;DR: The diagnostic model used LASSO with 10-fold cross-validation on 345 features to select BCa-related factors and compute a weighted diagnostic score. The prognostic model used LASSO-Cox regression to identify survival-related features and generate a risk score. Both models were trained on TCGA data and externally validated on the Shanghai General Hospital cohort.
Pages 4-5
Diagnostic Model Achieves High Accuracy across Multiple Cohorts

Feature selection results: From the initial 345 quantitative image features, the LASSO analysis with 10-fold cross-validation identified 22 BCa-related image factors that collectively contributed to the diagnostic model. These 22 features captured distinctions in nuclear morphology, texture patterns, and intensity distributions between cancerous and normal tissue that are difficult for the human eye to quantify consistently. The automatic segmentation pipeline produced visibly different processed images for BCa tissue versus normal bladder tissue, confirming that the CellProfiler-based approach could capture biologically meaningful structural differences.

BCa vs. normal bladder tissue: The diagnostic model demonstrated high accuracy in distinguishing BCa samples from normal bladder tissue across all three cohorts. AUC values were 96.3% in the training cohort, 89.2% in the test cohort, and 94.1% in the external validation cohort (Shanghai General Hospital). The fact that the external validation AUC (94.1%) actually exceeded the test cohort AUC (89.2%) is noteworthy, as external validation on independent institutional data is typically the most stringent benchmark. This suggests the model generalized well across different sample preparation protocols and patient populations.

BCa vs. glandular cystitis: A particularly clinically relevant test was distinguishing BCa from glandular cystitis, a benign condition that can mimic cancer histologically. The model achieved an AUC of 93.4% in the General cohort for this differential diagnosis. When combining all non-BCa samples (normal bladder tissue plus glandular cystitis), the model still performed well with an AUC of 93.8%. This differential diagnostic capability addresses a genuine clinical pain point, as glandular cystitis can lead to unnecessary treatment when misdiagnosed as cancer by conventional pathology review.

TL;DR: The LASSO-based diagnostic model selected 22 image features and achieved AUC values of 96.3% (training), 89.2% (test), and 94.1% (external validation) for BCa vs. normal tissue. It also distinguished BCa from glandular cystitis with 93.4% AUC, addressing a common diagnostic challenge in clinical pathology.
Pages 5-7
Risk Score Stratifies Survival and Correlates with Stage and Grade

Survival-related features: The LASSO-Cox analysis with 10-fold cross-validation in the TCGA cohort identified 18 survival-related image features from BCa samples. These features were used to construct the machine learning-based risk score. Importantly, the high-risk score group showed significant correlation with both high BCa stage and high BCa grade in both the TCGA and General cohorts, suggesting the risk score captures morphological features that are biologically linked to tumor aggressiveness and clinical outcomes.

Survival stratification in TCGA cohort: Kaplan-Meier survival analysis revealed a significant difference in overall survival between high-risk and low-risk groups in the TCGA cohort, with a hazard ratio (HR) of 2.09 (95% CI: 1.56-2.81, P < .0001). Patients classified as high-risk by the pathomics-based score had roughly twice the mortality risk compared to low-risk patients. This substantial and highly significant difference demonstrates that quantitative image features extracted from routine H&E slides contain prognostic information that extends beyond what is captured by standard pathological grading alone.

External validation in General cohort: The prognostic value was confirmed in the independent General cohort from Shanghai General Hospital, where the survival difference was even more pronounced: HR = 5.32 (95% CI: 2.95-9.59, P < .0001). The larger hazard ratio in the General cohort may reflect the smaller sample size (108 patients) or differences in the clinical characteristics of the two populations. Regardless, the consistency of results across two completely independent cohorts from different countries and data sources provides strong evidence for the robustness of the pathomics-based risk score.

Independent prognostic factor: Univariate and multivariate Cox regression analyses confirmed that the machine learning-based risk score acted as an independent predictor of survival for patients with BCa in both cohorts. This means the risk score provided prognostic information beyond what could be explained by conventional clinicopathologic variables such as tumor stage, grade, and patient age. Patients with high tumor stages (stage III/IV) had significantly higher risk scores, as did patients with high tumor grade, further supporting the biological relevance of the extracted image features.

TL;DR: The LASSO-Cox model identified 18 survival-related image features. The resulting risk score stratified overall survival with HR = 2.09 (P < .0001) in TCGA and HR = 5.32 (P < .0001) in the General cohort. Multivariate analysis confirmed it as an independent prognostic factor, with high-risk scores correlating significantly with advanced stage and high grade.
Pages 7-8
Integration Nomogram Outperforms Conventional Staging for Survival Prediction

Nomogram construction: Current survival prediction for BCa patients relies on conventional clinicopathologic factors such as tumor stage, pathological grade, and patient age. To improve upon these existing systems, the authors constructed an integration nomogram that combined the machine learning-based risk score with standard clinicopathologic factors. A nomogram is a graphical calculation tool widely used in clinical oncology that assigns point values to each predictor variable and sums them to estimate a patient's probability of a given outcome (in this case, 1-, 3-, and 5-year overall survival).

Calibration and accuracy: The integration nomogram was evaluated using calibration plots with bootstrapping, which showed excellent agreement between predicted and observed survival probabilities for 1-, 3-, and 5-year overall survival. ROC curve analysis demonstrated that the nomogram achieved AUC values of 77.7% for 1-year, 83.8% for 3-year, and 81.3% for 5-year overall survival prediction. These values exceeded the prediction accuracy of models based solely on conventional clinicopathologic factors (tumor stage and grade), indicating that the pathomics-based risk score added meaningful prognostic information to standard clinical variables.

Cross-cohort validation: Incremental values of survival prediction accuracy via the integration nomogram were observed in both the TCGA cohort and the General cohort. The stable predictive efficacy across cohorts is particularly encouraging because the two patient populations differ in ethnicity, treatment patterns, and sample preparation protocols. The nomogram format is also clinically practical, as it allows physicians to calculate individualized survival probabilities at the bedside without requiring computational expertise, potentially making this tool accessible for routine clinical decision-making.

Clinical implications: For patients with BCa, treatment decisions range from surveillance and intravesical therapy for non-muscle invasive disease to radical cystectomy for muscle-invasive or high-risk cases. A nomogram that integrates pathomics-based risk scoring could help clinicians identify patients who appear low-risk by conventional staging but carry hidden morphological features associated with poor outcomes. This could lead to earlier intensification of therapy for high-risk patients and avoidance of overtreatment in truly low-risk patients.

TL;DR: The integration nomogram combining ML-based risk score with clinicopathologic factors achieved AUCs of 77.7%, 83.8%, and 81.3% for 1-, 3-, and 5-year overall survival prediction, outperforming conventional staging systems alone. Calibration plots confirmed good agreement with observed outcomes, and results were consistent across both cohorts.
Pages 8-9
Pathomics vs. Radiomics, and Clinical Significance of the Findings

Pathomics advantage over radiomics: The authors position pathomics as complementary to, and potentially more informative than, radiomics for bladder cancer assessment. Radiomics extracts features from medical imaging (CT, MRI) and has been used to predict lymph node metastasis and prognosis in BCa. However, because radiomics relies on indirect representations of tissue, it may miss important information contained in tumor cells and the extracellular matrix. Pathomics digitalizes cancer tissues directly through slide scanning, extracting histological features that are closer to the underlying biology. The strong diagnostic and prognostic performance demonstrated in this study supports the value of this direct tissue-level approach.

Comparison with existing AI approaches: Deep convolutional neural networks have been reported to achieve accuracies of 100% and 92% for distinguishing multiple cancer samples and subtypes in other settings. For bladder cancer specifically, MRI-based radiomics nomograms have shown utility for prognosis prediction, and radiomics-based nomograms have been demonstrated for preoperative prediction of lymph node metastasis. The current study adds to this literature by showing that a relatively straightforward LASSO-based approach on pathomics features can achieve clinically meaningful diagnostic (AUC up to 96.3%) and prognostic (HR up to 5.32) performance without requiring deep learning architectures or expensive imaging modalities.

Practical accessibility: A notable strength of this approach is its accessibility. The image processing pipeline uses CellProfiler, an open-source and freely available tool, and the machine learning models are built with standard R packages (glmnet). The pathomics signature is described as "easy to understand and use by clinicians without sophisticated computational knowledge." This contrasts with deep learning-based approaches that often require significant computational resources and specialized expertise. The use of routine H&E-stained slides, which are already produced for every BCa diagnosis, means no additional tissue processing or imaging costs are required.

TL;DR: Pathomics extracts features directly from tissue slides, potentially capturing more biological detail than radiomics-based imaging. The study's LASSO-based approach achieved strong results (AUC up to 96.3% for diagnosis, HR up to 5.32 for prognosis) using open-source tools (CellProfiler, R glmnet) on routine H&E slides, making it accessible to clinicians without deep learning expertise.
Pages 9-10
Study Limitations and the Path Toward Clinical Translation

Diagnostic accuracy caveats: The authors acknowledge that their machine learning-based diagnosis model may show less accuracy compared to traditional diagnostics performed by experienced pathologists in certain cases. While AUC values of 89.2% to 96.3% are impressive for an automated system, they do not yet match the near-perfect accuracy that a senior genitourinary pathologist can achieve on straightforward cases. The model's greatest clinical value likely lies in screening, quality assurance, and assistance with diagnostically challenging cases (such as BCa vs. glandular cystitis), rather than replacing pathologist review entirely.

Cut-off value variability: A methodological limitation is that the cut-off value for high-risk vs. low-risk classification was defined as the median risk score within each cohort. This means different cohorts had different absolute thresholds for risk stratification, which complicates clinical implementation. A universal, validated cut-off would be needed before this tool could be deployed across institutions. The difference in hazard ratios between the TCGA cohort (HR = 2.09) and the General cohort (HR = 5.32) may partly reflect this cohort-specific threshold definition, as well as differences in sample size and patient demographics.

Retrospective design: The study is entirely retrospective, which introduces potential selection bias and limits the ability to draw causal conclusions. Prospective validation in clinical trial settings, ideally designed in accordance with SPIRIT-AI and CONSORT-AI guidelines, would be essential before this pathomics approach could be integrated into routine clinical workflows. The relatively small sample sizes (108 patients in the General cohort for prognosis) also limit statistical power and generalizability.

Future directions: Despite these limitations, the study provides a proof-of-concept that machine learning applied to quantitative pathomics features from routine H&E slides can both diagnose BCa and predict patient outcomes. Larger, multicenter prospective studies with standardized image acquisition protocols and pre-specified cut-off values would be the logical next step. Integration with molecular and genomic data could further enhance predictive accuracy. The open-source nature of the tools used (CellProfiler, R) facilitates reproducibility and independent validation by other research groups.

TL;DR: Key limitations include the retrospective design, cohort-specific risk score cut-offs, relatively small validation cohort (n = 108), and potentially lower accuracy than expert pathologists on routine cases. Prospective multicenter trials following SPIRIT-AI/CONSORT-AI guidelines and standardized universal cut-off values are needed before clinical deployment.
Citation: Chen S, Jiang L, Zheng X, et al.. Open Access, 2021. Available at: PMC8253293. DOI: 10.1111/cas.14927. License: cc by-nc.