Deep Learning in Pancreatic Cancer

Plain-English Explanations

Overview

Pages 1-2

What This Review Covers and Why It Matters

Pancreatic cancer (PC) is the fourth leading cause of cancer-related death in the United States, with a five-year overall survival rate of only 12%. If current trends hold, it is projected to become the second most fatal cancer by 2030, with an estimated global incidence of 355,317 cases by 2040. Over 90% of pancreatic cancers arise from exocrine cells, chiefly pancreatic ductal adenocarcinoma (PDAC), while the remainder include pancreatic neuroendocrine tumors (PNETs) and pancreatic cystic lesions (PCLs) such as intraductal papillary mucinous neoplasms (IPMNs) and mucinous cystic neoplasms (MCNs).

Only 10 to 15% of patients have resectable disease at diagnosis, and roughly 50% already present with metastatic disease. Standard workup involves multiphasic cross-sectional imaging (CT), endoscopic ultrasound (EUS) with fine-needle biopsy (FNB), and biomarkers such as CA 19-9 and CEA. Despite multimodal treatment with chemotherapy, radiation, and surgical resection, outcomes remain poor.

This review, authored by Patel, Zanos, and Hewitt at Northwell Health and NYU, systematically examines how deep learning (DL), a subset of machine learning, is being applied across the full spectrum of pancreatic cancer care. The authors searched PubMed for English-language publications from January 2019 to November 2023 using keywords including "pancreatic cancer," "deep learning," "radiomics," "large language models," and "generative adversarial networks." From 54 initial results, 26 studies were included after excluding repetitive and irrelevant content.

The review organizes findings by pathological subtype (PDAC, PCLs, PNETs) and by clinical application (diagnosis, postoperative prediction, treatment monitoring, and novel biomarker development). The most common algorithm type was convolutional neural networks (CNNs), appearing in 12 of the 20 PDAC-focused studies, with architectures including ResNet50, VGG11/VGG19, and ResNet18.

TL;DR: This 2024 systematic review covers 26 studies on deep learning in pancreatic cancer, spanning PDAC, cystic lesions, and neuroendocrine tumors. Pancreatic cancer has only 12% five-year survival, and DL models (primarily CNNs) are being applied to improve diagnosis, staging, treatment monitoring, and biomarker discovery.

CT-Based PDAC Detection

Pages 3-5

Deep Learning for CT Imaging and Tumor Classification

The diagnostic challenge: CT remains the most cost-effective and widely used imaging modality for evaluating pancreatic cancer. However, PDAC diagnosis from CT is difficult for radiologists because tumors often exhibit high heterogeneity. PDACs typically appear as hypoattenuating lesions on the venous phase, but they can also show isoattenuation, making them nearly indistinguishable from normal pancreatic tissue. Additional confusion arises from pancreatitis, which can mimic or co-occur with adenocarcinoma.

Radiomic and DL segmentation models: Gai et al. developed a comprehensive suite of AI algorithms for pancreatic segmentation, radiomic feature computation, and benign versus malignant classification. Their model achieved 74% pixel-wise segmentation accuracy and an AUC of 0.75 for tumor classification, though it was trained on only 77 patients. To differentiate autoimmune pancreatitis (AIP) from PDAC, Ziegelmayer et al. built a DL model using a VGG19 CNN that achieved an AUC of 0.9, compared to 0.8 for a pure radiomic approach.

Fusion models outperform radiomics alone: A consistent finding across multiple studies is that fusion models incorporating both DL features and radiomic features outperform pure radiomic models. Zhang et al. demonstrated a weak linear relationship between radiomic and DL features for predicting overall survival, suggesting the two approaches capture complementary information. Wei et al. used a VGG11-based multidomain fusion model combining radiomics and DL on 18F-FDG PET/CT images to distinguish PDAC from AIP, achieving an AUC of 0.96.

Ultrasound applications: Tong et al. applied a ResNet50 architecture to contrast-enhanced ultrasonography (CEUS) to differentiate chronic pancreatitis from chronic pancreatitis-associated PC, achieving a sensitivity of 93% and specificity of 84%. In a reader study, the model's performance was comparable to or slightly better than expert radiologists, and it improved the diagnostic accuracy of nearly all radiologists who used it as an aid.

TL;DR: DL models using CT and ultrasound data are achieving AUCs of 0.75 to 0.96 for PDAC detection and classification. Fusion models that combine radiomics with deep learning consistently outperform pure radiomic approaches. A ResNet50 ultrasound model reached 93% sensitivity and 84% specificity, matching expert radiologists.

Lymph Node and Staging Prediction

Pages 6-7

AI for Preoperative Lymph Node Metastasis and Staging

Lymph node metastasis prediction: Accurate preoperative assessment of lymph node (LN) status is critical because it directly influences surgical planning and the decision to administer neoadjuvant therapy. An et al. utilized a fused clinical, radiomic, and DL-based model applied to dual-energy CT images to predict LN metastatic burden, achieving an AUC of 0.92. Their ResNet18-based model demonstrated that AI can outperform radiologist predictions in specific investigative scenarios.

Reducing inter-reader variability: Bian et al. developed a CNN-based model for LN metastasis prediction that achieved an AUC of 0.91. Fu et al. built a deep-learning radiomics model for the same task with an AUC of 0.85. One major benefit of these objective AI models is that they reduce the significant variation seen in radiologist-dependent LN predictions. By standardizing the assessment, these tools can contribute to a more consistent preoperative staging schema across institutions.

Margin status prediction: Chang et al. used a 3D convolutional neural network to predict postoperative resection margin status (R0 versus R1) from preoperative cross-sectional imaging, achieving 81% accuracy and AUCs of 0.79 for LN status and 0.85 for margin status. Predicting whether a microscopically negative (R0) resection is achievable has direct implications for whether a patient should receive neoadjuvant chemotherapy before surgery or proceed with upfront resection.

TL;DR: AI models predict lymph node metastasis from CT with AUCs ranging from 0.85 to 0.92, reducing variability between radiologists. A 3D CNN predicts surgical margin status at 81% accuracy, helping clinicians decide between neoadjuvant therapy and upfront surgery.

EUS and Endoscopy AI

Pages 7-8

Deep Learning Applied to Endoscopic Ultrasound

EUS-guided classification: Tissue confirmation of pancreatic cancer typically requires EUS-guided fine-needle aspiration (EUS-FNA). Gu et al. conducted a prospective study developing a deep learning radiomics (DLR) model using EUS images to classify PDAC. Their model achieved an AUC of 0.94, with 83% sensitivity and 90% specificity. This was the only prospective study in the entire review, lending it additional credibility.

Elevating junior endoscopist performance: In a two-phase study design, Gu et al. showed that when junior endoscopists used the DLR model for assistance, their diagnostic accuracy rose to the level of expert endoscopists. This finding highlights a key practical benefit of AI in clinical settings: it can serve as an equalizer, bringing less experienced clinicians up to expert-level performance and reducing the impact of the learning curve on patient outcomes.

Broader implications for endoscopy: EUS images present unique challenges for AI because of their lower resolution and higher noise compared to CT or MRI. The success of the DLR model on EUS data suggests that deep learning can extract meaningful diagnostic features even from lower-quality imaging modalities, expanding the potential reach of AI-assisted diagnosis to settings where EUS is the primary investigative tool.

TL;DR: A prospective study showed that a deep learning radiomics model applied to EUS images achieved an AUC of 0.94 for PDAC classification. When junior endoscopists used the model, their performance matched that of experts, demonstrating AI's potential as a clinical equalizer.

Biomarkers and Histopathology

Pages 7-9

Novel Biomarker Discovery and Pathology-Based AI

Urinary biomarkers enhanced by AI: Blyuss et al. evaluated four urinary biomarkers (LYVE1, REG1B, REG1A, and TFF1) in conjunction with serum CA 19-9 for PC risk stratification. Their logistic regression model using urinary biomarkers achieved an AUC of 0.94, which improved to AUC 0.96 when CA 19-9 was added. The authors tested multiple ML and DL algorithms and found excellent performance across all of them, with no single algorithm demonstrating clear superiority.

RNA-based variants for differentiation: Al-Fatlawi et al. demonstrated that RNA-based variants, combined with CA 19-9, can differentiate between resectable PC, non-resectable PC, and chronic pancreatitis using a deep neural network (DNN) with an AUC of 0.96. Critically, their network identified two mutations, B4GALT5 and GSDMD, closely linked to PC progression and survival. This discovery points toward more personalized treatment regimens as targeted therapies are developed.

Histology-based subtyping with PACpAInt: Saillard et al. developed PACpAInt, a histology-based DL model that differentiates PC subtypes (classical versus basal, and stromal active versus inactive) directly from pathology slides. These subtypes are historically only distinguishable through expensive RNA sequencing. PACpAInt correctly predicts subtypes at the whole-slide level from both biopsies and surgical specimens, as well as disease-free and overall survival. Knowing the subtype matters clinically because the basal type carries a poorer prognosis linked to early metastasis and FOLFIRINOX resistance.

TL;DR: AI is enabling novel biomarker discovery for pancreatic cancer. Urinary biomarkers plus CA 19-9 reach AUC 0.96, RNA-based DL models identify mutations linked to survival, and PACpAInt predicts tumor subtypes from histology slides without costly RNA sequencing, guiding treatment decisions.

Postoperative and Treatment Prediction

Pages 8-9

Predicting Surgical Complications and Treatment Response

Postoperative pancreatic fistula (POPF): Surgical treatment for PC often involves a pancreatoduodenectomy (Whipple procedure) or distal pancreatectomy, with complication rates as high as 50% even at high-volume centers. POPF is one of the most clinically relevant complications, and its risk is correlated with gland texture and pancreatic duct size. Kambakamba et al. used ML to calculate texture-based features (histologic fibrosis, lipomatosis, and intraoperative hardness) and achieved an AUC of 0.95, 96% sensitivity, and 98% specificity for POPF prediction. This significantly outperformed existing clinical risk scores, which typically have AUCs of 0.7 to 0.8.

Neoadjuvant therapy response: Monitoring response to neoadjuvant chemotherapy (FOLFIRINOX or gemcitabine/nab-paclitaxel) is critical, but CA 19-9 has significant limitations: nearly 10% of the population lacks the Lewis antigen and will never show CA 19-9 elevation, and other factors like hyperbilirubinemia can cause falsely high values. Watson et al. used a LeNet CNN (5-layer architecture) combining preoperative CT imaging with a greater-than-10% decrease in CA 19-9 to predict histopathologic response. The fusion model achieved an AUC of 0.79, compared to 0.74 for imaging alone and 0.56 for CA 19-9 alone.

Clinical significance: These results demonstrate that DL can enhance the prognostic value of existing biomarkers and imaging by combining them in ways that would not be possible through conventional analysis. Even in a pilot study of only 81 patients, the fusion approach to predicting treatment response showed a statistically significant improvement (p < 0.001), suggesting that DL can unlock additional predictive information from data already being collected in standard clinical workflows.

TL;DR: ML predicts postoperative pancreatic fistula with 0.95 AUC and 96% sensitivity, far surpassing traditional risk scores. A CNN fusion model combining CT and CA 19-9 predicts neoadjuvant therapy response at AUC 0.79, showing DL can enhance standard biomarkers beyond what conventional analysis achieves.

LLMs and GANs

Pages 9-10

Large Language Models and Generative Adversarial Networks

ChatGPT in pancreatic cancer management: Walker et al. assessed the reliability of GPT-4 for managing five hepatobiliary and pancreatic conditions, including pancreatic cancer. They found only 60% agreement between GPT-4 responses and clinical guideline recommendations, though the model showed 100% consistency in its responses. While the chatbot could not reliably provide guideline-concordant recommendations, the study noted that GPT-4 could provide information superior to what patients typically find on the internet, suggesting a role in patient education if not in clinical decision-making.

NLP for mining radiology reports: Do et al. utilized natural language processing (NLP) to classify radiology reports describing metastatic disease patterns across cancers including pancreatic cancer, achieving 90% accuracy. This study also created a large database of over 90,000 labeled radiology reports, a resource that can be used to train future AI algorithms across multiple healthcare domains.

GANs for radiation therapy planning: Hooshangnejad et al. created deepPERFECT, a GAN-based model that synthesizes planning CT scans from diagnostic CTs with a dice similarity coefficient (DSC) of 0.93. This eliminates the need for separate planning imaging before radiation therapy, reducing patient wait times by approximately one week. Momin et al. used GANs to predict stereotactic body radiation therapy (SBRT) dose distributions from cross-sectional imaging with 91% accuracy, showing no significant differences from expert radiation oncologist ground truth (p < 0.05).

TL;DR: GPT-4 matched clinical guidelines only 60% of the time for pancreatic cancer management but may assist with patient education. NLP classified metastatic patterns from radiology reports at 90% accuracy. GANs can synthesize RT planning CTs (DSC 0.93) and predict SBRT dose distributions (91% accuracy), reducing treatment delays.

Cystic Lesions and PNETs

Pages 10-12

AI for Pancreatic Cystic Lesions and Neuroendocrine Tumors

Cystic lesion classification: IPMNs and MCNs are mucinous cystic lesions that account for approximately 8% of all pancreatic cancers and carry a risk of malignant transformation. Before DL, cross-sectional imaging had diagnostic accuracies of only 40 to 45% for these lesions. Liang et al. developed fused DL radiomics models achieving an AUC of 0.97 for diagnosing IPMN and MCN, with an AUC of 0.92 for differentiating serous cystadenoma (SCA). The fusion models once again outperformed pure radiomic approaches.

IPMN malignancy risk stratification: IPMNs represent about 50% of all incidentally detected PCLs, and the incidence of invasive carcinoma in resected IPMNs is approximately 23%. Because pancreatectomy carries significant morbidity, accurate risk stratification is essential to avoid unnecessary surgery. Kuwahara et al. used CNN architectures including AlexNet and ResNet to predict malignancy in IPMNs, achieving an AUC of 0.91, 94% accuracy, 96% sensitivity, and 93% specificity. Compared to the 2006 Sendai guidelines (sensitivity 100%, specificity 7.6%) and the 2012 Fukuoka guidelines (sensitivity 84.8%, specificity 45%), the AI model offered a dramatically better balance of sensitivity and specificity.

Neuroendocrine tumor grading and recurrence: PNETs are rare tumors where complete surgical resection is often curative for non-metastatic disease. Song et al. used a DLR model with preoperative clinical features and arterial-phase CT scans to predict postoperative recurrence, achieving a maximum AUC of 0.7 in the validation cohort (56 training, 18 validation patients). To address the problem of small datasets caused by PNET rarity, Gao et al. used GANs to generate synthetic MRI images to augment a dataset of only 96 patients, then built a DL model to predict WHO grade of PNETs with 85% accuracy and AUC of 0.91.

TL;DR: DL dramatically improved cystic lesion diagnosis from 40-45% accuracy to AUC 0.97. For IPMN malignancy risk, AI achieved 94% accuracy with balanced sensitivity/specificity, far surpassing existing clinical guidelines. For rare PNETs, GANs generated synthetic images to overcome small datasets, enabling WHO grade prediction at 85% accuracy.

Ethics, Limitations, and Future

Pages 11-14

Ethical Considerations, Limitations, and What Comes Next

Ethical framework: The review highlights the WHO's six core principles for AI in health: protecting autonomy, promoting well-being and safety, ensuring transparency and explainability, fostering accountability, ensuring inclusiveness and equity, and promoting responsive and sustainable AI. The complexity of modern DL systems creates an "opacity" problem where clinicians with minimal technical backgrounds struggle to understand how models produce recommendations, making it difficult to justify clinical decisions based on AI output.

Accountability gap: The review identifies a tension between clinicians, who are legally accountable for their actions, and technologists, who operate under ethical principles of practice rather than legal liability. The authors argue that governing entities composed of multiple stakeholders (physicians and technologists) must be created to establish clear accountability frameworks and legislation for AI in healthcare. Data access agreements and cybersecurity considerations aligned with patient privacy laws are described as foundational to safe AI development.

Key limitations of the reviewed studies: Most studies were retrospective with small sample sizes (often fewer than 100 patients). Only one study (Gu et al., on EUS-based classification) was prospective. Small datasets are particularly problematic for DL algorithms, which typically require thousands of examples for robust training. The rarity of certain subtypes (PNETs, specific cystic lesions) compounds this challenge. External validation across multiple institutions was rare, raising questions about generalizability.

Future directions: The authors envision DL aiding across all facets of PC care: earlier diagnosis through improved imaging analysis, more accurate biomarkers, better treatment response monitoring, perioperative outcome prediction, and optimized resource utilization. The development of fusion models (combining radiomics, DL, clinical features, and genomics) appears to be the most promising trajectory, as these consistently outperform single-modality approaches. Prospective, multi-institutional validation studies will be essential to translate these promising research findings into clinical practice.

TL;DR: Most reviewed studies were retrospective with small sample sizes and lacked external validation. The WHO's ethical principles must guide AI deployment, and clear accountability frameworks are needed. Fusion models combining DL, radiomics, and clinical data are the most promising path forward, but prospective multi-center validation is essential before clinical adoption.

Deep Learning Applications in Pancreatic Cancer

Original Paper (PDF)