Development and Validation of Machine Learning Nomograms for Predicting Survival in Stage IV Pancreatic Cancer

World J Gastrointest Oncol 2025 AI 9 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why This Study Was Needed and What It Set Out to Do

Stage IV pancreatic cancer (PC) is among the most lethal malignancies in medicine. By 2025, pancreatic cancer is projected to surpass breast cancer as the third leading cause of cancer-related death. In the United States alone, roughly 66,440 new cases and 51,750 deaths were estimated in 2024. The disease is often silent in its early stages, with more than half of patients already presenting with distant organ metastasis at the time of initial diagnosis, and the five-year relative survival rate stands at only 12.8%.

In clinical practice, considerable heterogeneity exists in survival outcomes among patients with stage IV pancreatic ductal adenocarcinoma (PDAC). Two patients with the same TNM stage can have vastly different lifespans depending on factors like age, tumor grade, treatment received, and which organs harbor metastases. Despite this, clinicians have lacked a reliable, individualized prognostic tool specifically designed for the stage IV population.

A nomogram is a visual scoring tool that combines multiple prognostic factors into a single graphic. Each variable is assigned a point value, and the sum of all points maps to a predicted survival probability. Nomograms are widely used in oncology because they translate complex statistical models into an intuitive format that clinicians can use at the bedside. Machine learning algorithms can improve variable selection and model robustness compared to traditional statistical approaches alone.

This study, led by Huang, Chen, and colleagues at Chongqing Medical University and Mianyang Hospital of Traditional Chinese Medicine, leveraged the SEER (Surveillance, Epidemiology, and End Results) database to build and validate a machine learning-based nomogram for predicting overall survival (OS) and cancer-specific survival (CSS) at 6, 12, and 18 months in stage IV PC patients. The SEER database, maintained by the U.S. National Cancer Institute, provides a large, population-based cohort ideal for this kind of prognostic modeling.

TL;DR: Stage IV pancreatic cancer has a five-year survival rate of only 12.8%, and clinicians lack individualized prognostic tools for this population. This study used the SEER database and machine learning to build a nomogram that predicts 6-, 12-, and 18-month overall survival and cancer-specific survival for stage IV PC patients.
Pages 2-3
Patient Selection, Data Extraction, and Statistical Framework

Patient cohort: Using SEER*Stat v8.3.9, the authors extracted clinical data for patients with pathologically confirmed stage IV PC diagnosed between 2000 and 2019. Inclusion required a primary pancreatic cancer diagnosis at TNM stage IV with specific ICD-O-3 histology codes (including ductal adenocarcinoma variants). Patients were excluded if they had multifocal tumors, were diagnosed posthumously, or had incomplete clinical or follow-up data. A total of 1,662 patients met the criteria.

Train-validation split: The dataset was randomly divided into a training set (70%, n = 1,163) and a validation set (30%, n = 499). Baseline characteristics were well-balanced between the two groups across all variables (all P > 0.05), confirming that the random split did not introduce systematic bias. The median follow-up time was 4 months in both cohorts.

Variables collected: Data extracted included patient age at diagnosis, sex, race, marital status, primary tumor location (head, body, tail, or other), T stage, N stage, tumor grade, treatment information (surgery, chemotherapy, radiotherapy), and metastatic sites (liver, bone, lung). The primary endpoints were CSS (time from follow-up start to death from pancreatic cancer) and OS (time from follow-up start to death from any cause).

Statistical analysis plan: The study used a multi-method approach for variable selection. First, univariate Cox proportional hazards regression identified candidate prognostic factors. Significant variables were then entered into a multivariate Cox model. Before multivariate modeling, multicollinearity was assessed using variance inflation factors (VIFs) and pairwise Pearson correlation coefficients, with thresholds of VIF less than 5 and correlation less than 0.7 to rule out collinearity problems.

TL;DR: The study enrolled 1,662 stage IV pancreatic cancer patients from the SEER database (2000-2019), split 70/30 into training and validation sets. Variables included age, tumor grade, treatment, and metastatic sites. Multicollinearity was checked before building multivariate Cox regression models.
Pages 3-5
Three Complementary Approaches to Variable Selection

LASSO regression: To refine variable selection and prevent overfitting, the authors applied Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. LASSO works by adding a penalty term to the regression that shrinks less important variable coefficients toward zero, effectively performing automatic feature selection. The optimal penalty parameter (lambda) was determined through cross-validation. The LASSO analysis identified surgery, chemotherapy, and liver metastasis as the variables most strongly associated with both OS and CSS.

Random Survival Forest: In parallel, a Random Survival Forest (RSF) model was employed for independent prognostic analysis. RSF is a machine learning ensemble method that builds hundreds of decision trees on random subsets of the data and averages their predictions. Unlike Cox regression, RSF does not require the proportional hazards assumption or log-linearity, making it more robust to complex, nonlinear relationships. Variable importance scores ranked the top five predictors for OS as chemotherapy, surgery, liver metastasis, age, and grade. For CSS, the order was chemotherapy, surgery, liver metastasis, grade, and age.

Multivariate Cox regression: The classical multivariate Cox proportional hazards model identified age, race, marital status, tumor location, N stage, grade, surgery, chemotherapy, and liver metastasis as independent predictors of both OS and CSS (all P less than 0.05). Pearson correlation analysis confirmed that all variable pairs (except surgery and grade) had correlations below 0.7, and all VIF values were below 5.

Convergence of methods: The strength of this study lies in the triangulation of three independent methods. By combining results from the multivariate Cox model, LASSO, and Random Survival Forest with clinical relevance, the authors selected seven final variables for the nomogram: age, tumor grade, surgical resection, chemotherapy, liver metastasis, bone metastasis, and lung metastasis. Variables that appeared consistently across all three methods (such as chemotherapy, surgery, and liver metastasis) were treated with the highest confidence.

TL;DR: Three methods were used to select prognostic variables: LASSO regression, Random Survival Forest, and multivariate Cox regression. All three converged on chemotherapy, surgery, and liver metastasis as the most important predictors. Seven final variables were selected to build the nomogram.
Pages 5-8
What Drives Survival in Stage IV Pancreatic Cancer

Treatment factors dominated: Among the 1,163 training-set patients, chemotherapy was the single most influential variable, with a hazard ratio (HR) of 0.33 for OS (95% CI: 0.30 to 0.38, P less than 0.001), meaning patients who received chemotherapy had roughly one-third the risk of death compared to those who did not. Surgical resection was the second strongest predictor, with an HR of 0.46 for OS (95% CI: 0.38 to 0.56, P less than 0.001). These HRs held similarly for CSS, indicating that treatment decisions are the dominant drivers of survival variation in this population.

Metastatic burden matters: Liver metastasis was the most impactful metastatic variable, with an HR of 1.49 for OS (95% CI: 1.32 to 1.68, P less than 0.001). Of the 1,662 patients, 63% had liver metastases, 23.1% had lung metastases, and 6.8% had bone metastases. Liver metastasis appeared as a significant predictor across all three statistical methods, while bone and lung metastasis contributed additional prognostic information in the final model.

Age and grade: Patients aged 60 to 75 had a 26% higher risk of death than those under 60 (HR 1.26, P = 0.001), and patients over 75 had a 70% higher risk (HR 1.70, P less than 0.001). Tumor grade was also independently prognostic: Grade III/IV tumors carried a 68% increased risk of death compared to Grade I/II (HR 1.68, P less than 0.001).

Factors not retained: Sex was not significantly associated with either OS or CSS. Radiotherapy showed a modest protective effect on univariate analysis (HR 0.76, P = 0.027 for OS) but was not retained as a top predictor in the LASSO or Random Survival Forest analyses. T stage showed inconsistent effects across categories. Race and marital status were significant in the Cox model but did not emerge as top-ranked variables in the machine learning methods, so they were not included in the final nomogram to keep the tool practical and parsimonious.

TL;DR: Chemotherapy (HR 0.33) and surgery (HR 0.46) were the strongest survival predictors. Liver metastasis (HR 1.49), older age, and higher tumor grade also significantly worsened prognosis. Sex and radiotherapy were not retained in the final model.
Pages 9-10
How the Nomogram Works in Practice

Point-based scoring system: The final nomogram assigns each of the seven variables a score based on its regression coefficient. The clinician reads off a point value for each variable (age group, tumor grade, whether surgery was performed, whether chemotherapy was given, and presence of liver, bone, or lung metastasis), then sums them into a Total Points score. This total maps directly to predicted 6-month, 12-month, and 18-month OS and CSS probabilities via visual scales at the bottom of the nomogram.

Worked clinical example: The authors illustrate the nomogram with a concrete case: a 65-year-old patient with Grade III pancreatic cancer and liver metastases, but no bone or lung metastases, who has not undergone surgery or chemotherapy. This patient's OS total score is 486, corresponding to estimated cumulative survival probabilities of 37.6% at 6 months, 14.6% at 12 months, and 6.16% at 18 months. The CSS total score is 482, yielding survival probabilities of 38.7%, 15.6%, and 6.69% at the same time points.

Decision support: The nomogram can also model "what-if" scenarios. Using the same patient, clinicians can recalculate the score assuming surgery and/or chemotherapy are delivered, providing personalized, intuitive, and comprehensible objective parameters for treatment planning. This transforms the nomogram from a passive prognostic tool into an active decision-support instrument that helps patients and physicians weigh the potential benefit of different treatment strategies.

TL;DR: The nomogram assigns points for age, grade, surgery, chemotherapy, and metastatic sites, then sums them to predict 6-, 12-, and 18-month survival. Clinicians can model treatment scenarios (e.g., with or without chemotherapy) to guide personalized decision-making.
Pages 9-11
Discrimination, Calibration, and ROC Analysis

C-index (concordance): The C-index measures a survival model's ability to correctly rank patients by risk. A C-index of 0.5 is random chance, and 1.0 is perfect discrimination. The nomogram achieved a C-index of 0.727 for OS (95% CI: 0.711 to 0.743) and 0.727 for CSS (95% CI: 0.711 to 0.743) in the training set. In the validation set, the C-index was 0.719 for OS (95% CI: 0.695 to 0.744) and 0.716 for CSS (95% CI: 0.691 to 0.741). The small drop from training to validation indicates good generalizability without substantial overfitting.

AUC values at specific time points: Receiver operating characteristic (ROC) curves were generated for 6, 12, and 18 months. For OS in the training set, AUC values were 0.801 at 6 months, 0.775 at 12 months, and 0.787 at 18 months. In the validation set, corresponding values were 0.762, 0.799, and 0.785. CSS showed nearly identical performance. AUC values above 0.75 across all time points and both cohorts indicate strong discriminatory ability.

Calibration curves: Internal validation (training set) and external validation (validation set) were performed using the bootstrap method with 1,000 resamples. The calibration curves for 6-, 12-, and 18-month OS and CSS closely aligned with the ideal 45-degree reference line in both cohorts, indicating excellent agreement between predicted probabilities and actual observed outcomes. This means the nomogram does not systematically overestimate or underestimate survival.

TL;DR: The nomogram achieved C-index values of 0.727 (training) and 0.719 (validation) for OS, with AUCs ranging from 0.762 to 0.801 across time points. Calibration curves closely matched the ideal line, confirming that predicted and observed survival probabilities are well-aligned.
Pages 11-12
Dividing Patients into Low-Risk and High-Risk Groups

Cutoff determination: Using X-tile software, optimal cutoff values were identified for risk stratification. For OS, patients with Total Points below 148 were classified as low risk, and those above 148 as high risk. For CSS, the cutoff was 189.7 points. These thresholds were derived from the training set and then applied to the validation set for independent confirmation.

Survival separation: Kaplan-Meier curves showed that the nomogram effectively differentiated survival prognosis between low-risk and high-risk groups in both the training and validation sets, with highly significant separation (both P less than 0.0001). This confirms that the nomogram captures meaningful biological and clinical variation, not just statistical noise.

Clinical utility via DCA: Decision curve analysis (DCA) compared the nomogram's net clinical benefit against the traditional TNM staging system. Across all time points (6, 12, and 18 months) and both endpoints (OS and CSS), the nomogram demonstrated a greater net benefit than TNM staging alone. DCA evaluates whether using a model to guide treatment decisions leads to better outcomes than treating all patients or treating none, making it a more clinically relevant metric than AUC or C-index.

Guiding follow-up intensity: Risk stratification based on the nomogram total score can guide postoperative monitoring. High-risk patients (those exceeding the CSS cutoff of 189.7) would warrant more frequent imaging surveillance and closer clinical follow-up, while low-risk patients might be managed with standard protocols. This targeted approach helps allocate limited healthcare resources more efficiently.

TL;DR: Patients were stratified into low-risk and high-risk groups using nomogram score cutoffs (148 for OS, 189.7 for CSS). Kaplan-Meier curves confirmed highly significant survival separation (P less than 0.0001). Decision curve analysis showed the nomogram outperformed TNM staging in clinical net benefit.
Pages 12-13
How These Findings Fit Into Current Pancreatic Cancer Care

Addressing a gap in stage IV management: Approximately 80% of pancreatic cancer patients present with locally advanced or metastatic disease at initial diagnosis, precluding curative surgical intervention. The TNM staging system, while widely used, has well-known limitations for prognosis in this population: it ignores non-anatomic factors affecting survival and cannot visualize individual survival outcomes. This nomogram fills that gap by incorporating treatment and patient-level variables into a personalized prediction tool.

Surgery in metastatic disease: Although surgical resection is rarely performed in stage IV disease (only 10.83% of patients in this cohort underwent surgery), the nomogram confirms its substantial protective effect (HR 0.46). Prior studies have reported that among PDAC patients with liver metastases who underwent surgery, median OS reached 11.4 months compared to 5.9 months for those without surgery. The nomogram can help identify which metastatic patients might benefit most from aggressive surgical approaches.

Chemotherapy as the dominant modifiable factor: With an HR of 0.33, chemotherapy reduced the risk of death by roughly two-thirds. In the study cohort, 58.7% of patients received chemotherapy. The nomogram highlights the outsized impact of chemotherapy on survival predictions, reinforcing its central role in stage IV PC management and potentially motivating discussions about treatment in patients who might otherwise decline it.

TL;DR: The nomogram addresses a real clinical gap: TNM staging alone cannot personalize prognosis for stage IV PC. Surgery and chemotherapy showed the largest survival benefits, and the nomogram can help identify which metastatic patients might benefit from aggressive treatment approaches.
Pages 13-16
Study Strengths, Weaknesses, and What Comes Next

Key strengths: The study enrolled 1,662 patients, of whom 97.95% (1,628) experienced mortality events and 93.86% (1,560) had cancer-specific deaths, providing ample statistical power. The multi-method approach (combining Cox regression, LASSO, and Random Survival Forest) for variable selection adds robustness. The authors also rigorously assessed multicollinearity before modeling, and the model was validated on an independent 30% holdout set with consistent performance.

Retrospective design limitations: As a retrospective study, selection bias is inherent despite strict inclusion and exclusion criteria. The SEER database, while large, is subject to coding errors and missing values. Critically, the database does not provide details on specific chemotherapy regimens (e.g., FOLFIRINOX vs. gemcitabine/nab-paclitaxel), surgical approaches, radiation protocols, or tumor recurrence data. These omissions may limit the model's granularity and clinical applicability.

Category imbalance: Some variable categories had small sample sizes. For example, only 6.8% of patients had bone metastases, 6.3% received radiotherapy, and 10.8% underwent surgery. The large "Unknown" category for tumor grade (64.6%) is particularly notable and could bias the model, as missingness in registry data is rarely random. Additionally, 97.95% of patients died during follow-up, reflecting the extremely poor prognosis of stage IV PC but also meaning the model was trained predominantly on patients with very short survival.

Future directions: External validation at independent institutions would strengthen confidence in the model's generalizability. Incorporating molecular and genomic data, specific treatment regimen details, and performance status scores could improve predictive accuracy. Prospective validation studies would provide the strongest evidence before the nomogram is adopted in routine clinical practice. The integration of newer machine learning architectures such as XGBoost or deep learning models could potentially improve upon the C-index of 0.72 achieved here.

TL;DR: The study's strengths include a large cohort, multi-method variable selection, and rigorous validation. Limitations include retrospective design, lack of treatment regimen details in SEER data, and missing tumor grade in 64.6% of cases. External and prospective validation are needed before clinical adoption.
Citation: Huang K, Chen Z, Yuan XZ, He YS, Lan X, Du CY.. Open Access, 2025. Available at: PMC12142263. DOI: 10.4251/wjgo.v17.i5.102459. License: cc by-nc.