Pancreatic Cancer Detection: Biomarkers, Imaging, and ML

Plain-English Explanations

Overview

Pages 1-2

Why Pancreatic Cancer Needs AI and What This Review Covers

Pancreatic cancer accounts for roughly 4% of all cancer-related deaths and ranks as the seventh leading cause of cancer mortality worldwide. The dominant histological subtype is pancreatic ductal adenocarcinoma (PDAC), which originates from abnormal DNA mutations in ductal cells. Although surgical resection (typically a pancreaticoduodenectomy) remains the only curative option, most patients present at an advanced stage and are ineligible for surgery. Even among those who undergo complete resection, survival and recurrence rates remain poor.

Current detection relies on histopathological analysis and cytological specimens obtained through CT-guided biopsy, endoscopic ultrasonography (EUS), laparoscopic or open exploratory biopsy, and ascites cytology. Screening is currently recommended only for high-risk individuals whose lifetime risk exceeds 5%, including those with familial predisposition, germline mutations in genes such as BRCA2, CDKN2A, or PALB2, and patients with mucinous cystic lesions or cancer syndromes like Lynch syndrome 2, HBOC, FAMM, Li-Fraumeni, or Peutz-Jeghers syndrome.

This 2024 review, published in Cureus by Daher, Punchayil, Ismail, and colleagues from multiple institutions, examines the evolving role of artificial intelligence (AI) in pancreatic cancer detection. The authors cover biomarker identification, imaging technologies (CT, EUS, MRI, PET), and machine learning algorithms, while also addressing ethical challenges such as data privacy, algorithmic bias, and accountability. The goal is to outline how AI-driven methods can enhance early detection and improve the historically dismal outcomes for this disease.

TL;DR: Pancreatic cancer is the 7th leading cause of cancer death globally, and most patients are diagnosed too late for curative surgery. This review examines how AI, through biomarker analysis, enhanced imaging, and ML algorithms, can improve early detection, while also discussing ethical and practical challenges.

Screening and Risk Factors

Pages 2-3

Who Gets Screened and Why Mass Screening Falls Short

The false-positive problem: Because the lifetime risk of pancreatic cancer in the general population is less than 2%, even a highly specific screening test would produce many false positives, exposing subjects to unnecessary surgical procedures and considerable medical, mental, and financial strain. For this reason, international guidelines restrict screening to high-risk groups. A 20-year prospective study by Klatte et al. (2022) followed 347 carriers of a germline CDKN2A mutation using MRI with or without EUS, detecting 36 PDAC cases with a cumulative incidence of 20.7% by age 70. Importantly, 83.3% of detected cases were resectable, and the five-year survival rate was 32.4%, highlighting how focused surveillance in high-risk populations can shift detection to curable stages.

High-risk populations: Beyond inherited mutations, risk factors include cancer syndromes with mostly autosomal dominant inheritance, family history (which confers up to a 50% lifetime risk), and modifiable factors like smoking and obesity. New-onset diabetes mellitus (less than one year) is a notable clinical red flag, carrying a substantially increased risk for PDAC compared to the modest 1 to 1.5-fold risk seen in long-standing diabetes of more than five years.

Clinical triggers for workup: Findings that raise suspicion for a malignant pancreatic lesion include cysts greater than or equal to 3 cm, thickened cyst walls, elevated CA 19-9, and cyst growth exceeding 5 mm over two years. When these features are present, the probability of invasive cancer is higher and surgical resection is often warranted. Molecular imaging and pancreatic juice analysis remain areas of active research, with potential for AI integration.

TL;DR: Mass screening for pancreatic cancer is impractical due to low population incidence and high false-positive rates. Targeted surveillance of high-risk groups (germline mutations, family history, cancer syndromes) can detect 83% resectable tumors. New-onset diabetes is a key clinical warning sign for PDAC development.

Biomarkers

Pages 3-4

Biomarker Identification, Limitations, and AI-Driven Analysis

The biomarker landscape: The six primary tumor markers tracked for pancreatic cancer are CA 19-9, CEA, CA242, microRNAs, CA125, and K-RAS gene mutations. Additional detection markers include TP53, circulating cell-free DNA (cfDNA), Smad4, PDAC1, and CDKN2A. CA 19-9 remains by far the most widely used for predicting prognosis and monitoring recurrence post-surgery. However, it is not suitable for mass screening of early-stage disease because it is elevated in other gastrointestinal adenocarcinomas and several benign disorders. Furthermore, 5 to 10% of the population lacks the enzymes needed to produce CA 19-9 and cannot express it at all.

Validation of multi-gene panels: A meta-analysis by Hagen et al. (2018) analyzed transcriptomes from 18 fresh-frozen tissues (15-80% tumor content) and 13 non-tumor tissues using Illumina HumanRef-12 BeadArray. They identified elevated pathways including TCR, TNF-alpha, TGF-beta receptor, MAPK, and integrin signaling, classified into four gene clusters. Gene expression data from 178 pancreatic cancer patients showed that clusters 2 through 4 were correlated with survival, confirming their prognostic utility.

Recent technological advances: Extracellular vesicles (EVs) isolated from blood, saliva, and other fluids are emerging as practical cancer biomarkers. Next-generation sequencing (NGS) technologies now enable simultaneous investigation of mutations, copy number variations, and gene fusions, replacing traditional FISH and qRT-PCR methods with a more efficient and tissue-saving approach. Electrochemical biosensors based on nanomaterials offer a rapid and sensitive detection method for cancer biomarkers. The creation of massive biological multi-omics datasets and AI algorithms has further expanded the possibilities for biomarker identification.

Limitations: The molecular heterogeneity of pancreatic cancer means a single biomarker may not capture variations across disease stages or subtypes. The shortage of highly specific monoclonal antibodies complicates clinical translation of many prospective protein biomarkers. Detection is also constrained by limitations in ELISA assays and metabolic imaging techniques like 18F-FDG-PET.

TL;DR: CA 19-9 is the most common pancreatic cancer biomarker but is unsuitable for mass screening because 5-10% of people cannot produce it and it is elevated in other conditions. Multi-gene panels validated on 178 patients show prognostic value. Emerging technologies (EVs, NGS, nanomaterial biosensors) and AI-powered multi-omics analysis are expanding detection capabilities.

AI-Powered Biomarker Research

Pages 4-5

How AI Transforms Biomarker Discovery and Cancer Prediction

Large-scale transcriptomic profiling: Using ML and data from 1,665 non-tumorous and 2,316 HCC tissue samples, Kaur et al. identified three platform-independent diagnostic genes (FCN3, CLEC1B, and PRC1) that could detect hepatocellular carcinoma with 93 to 98% precision across both training and validation datasets. Gholizadeh et al. used ML to reveal four additional prognostic markers (SOCS2, MAGEA6, RDH16, RTN3) alongside three diagnostic biomarkers (CYP2E1, ARK1C3, AFP). These cross-cancer examples demonstrate the transferable power of AI-driven biomarker discovery.

Deep learning for cancer risk prediction: Liang et al. (2021) built a CNN that combined imaging, histology, electronic health records, and molecular biomarkers to predict HCC diagnosis risk within one year. Out of 47,945 participants (9,553 with HCC), the model achieved an AUC of 0.94 for one-year risk prediction, highlighting how AI can pool disparate clinical data sources into a unified predictive framework.

Pancreatic cancer-specific AI applications: Yang et al. (2020) created a multi-analyte panel incorporating miRNAs, mRNAs, cfDNA, and CA 19-9. When these data were used to train various algorithms for PDAC diagnosis, the model achieved 92% accuracy. For disease staging, accuracy reached 84%. Separately, Li et al. (2021) gathered biomarker data from multiple institutions and built predictive models using six ML algorithms. The support vector machine (SVM) and K nearest neighbor (KNN) models achieved 70.9% and 73.4% accuracy for predicting one-year and two-year recurrence, respectively.

Treatment response prediction: Hsu et al. (2022) used a random forest algorithm with blood biomarkers (AFP, albumin-bilirubin grade, circulating angiogenic factors) to predict the efficacy of lenvatinib in unresectable HCC, demonstrating AI's growing role in guiding therapeutic decisions through biomarker-based models.

TL;DR: AI-driven biomarker analysis achieves 92% accuracy for PDAC diagnosis and 84% for staging using multi-analyte panels of miRNAs, mRNAs, cfDNA, and CA 19-9. ML models predict one- and two-year recurrence at 70.9% and 73.4% accuracy. A CNN combining multiple data sources reached AUC 0.94 for cancer risk prediction in a cohort of nearly 48,000 participants.

Imaging Technologies

Pages 5-6

Traditional Imaging, Its Limitations, and AI-Enhanced Methods

Current imaging modalities: The primary tools for pancreatic cancer investigation include multidetector CT (MDCT), EUS with fine-needle aspiration (FNA), MRI/MRCP, and PET. On CT, pancreatic cancer appears as a hypo-attenuated or rarely iso-attenuated mass with abundant fibrous stroma. CT has a 90% detection rate for solid pancreatic lesions, but small tumors under 2 cm are easily missed due to their iso-attenuating potential. EUS is superior to CT, MRI, and PET for detecting small tumors and lymph node metastases. MRI provides hypointense lesions on T1-weighted and iso/hyperintense lesions on T2-weighted imaging, with contrast enhancement improving accuracy for vascular involvement.

The overlap problem: Despite advances in multimodal imaging, non-neoplastic pancreatic and peripancreatic entities can closely resemble primary pancreatic neoplasms on ultrasonography, CT, and MRI. Conversely, primary pancreatic cancer can go unnoticed during imaging. This persistent overlap between malignant and benign findings drives the need for AI-enhanced discrimination.

AI-enhanced imaging breakthroughs: A CNN trained on a database of 4,385 CT images was tested on scans from 100 patients pre-assessed by three imaging specialists. The CNN demonstrated high accuracy and required only 3 seconds to reach a diagnostic finding, compared to 8 minutes for a specialist. Muhammad et al. (2019) used artificial neural networks with 80.7% sensitivity and specificity to stratify pancreatic cancer risk as low, medium, or high based on personal health data alone, even in the absence of symptoms.

EUS-based deep learning: A DL model-based computer-assisted diagnosis (CAD) system was developed to evaluate EUS images and detect pancreatic cancer, chronic pancreatitis (CP), and normal pancreas. The system was trained on 920 EUS images and tested on 470 images, achieving detection efficiency of 94% in testing and 92% in validation. This demonstrates that DL can extract meaningful diagnostic features even from the lower-resolution, higher-noise images produced by endoscopic ultrasound.

TL;DR: CT detects 90% of solid pancreatic lesions but misses small tumors under 2 cm. A CNN diagnosed pancreatic cancer from CT in 3 seconds versus 8 minutes for a specialist. An EUS-based CAD system trained on 920 images achieved 94% detection efficiency. ANNs stratified cancer risk at 80.7% sensitivity/specificity using personal health data alone.

Machine Learning Algorithms

Pages 6-8

ML Algorithm Types, Performance Evaluation, and Validation Challenges

Supervised vs. unsupervised learning: The major ML categories used in pancreatic cancer detection include supervised learning (classification and regression) and unsupervised learning (clustering and dimensionality reduction). Supervised algorithms such as support vector machines (SVM), linear regression, and Naive Bayes are trained with labeled input/output data to make predictions on new cases. Unsupervised algorithms process unlabeled data to discover hidden patterns and subgroups. These approaches can be applied to diverse datasets including imaging, biomarkers, and histology slides, automating the tedious manual labor of reviewing medical data one case at a time.

Performance metrics: ML classifiers are evaluated using receiver operating characteristic (ROC) curves, where the area under the curve (AUROC) quantifies overall model effectiveness. Higher AUC values indicate better discrimination. Additional metrics include F1-Score, positive predictive value (PPV), negative predictive value (NPV), and relative risk (RR). Prediction ability in clinical records has been assessed using these metrics across multiple studies applying ML to estimate pancreatic cancer risk from electronic health records.

Data quality and diversity challenges: One of the largest obstacles is the limited availability of comprehensive, diverse datasets for pancreatic cancer patients. Class imbalance between positive and negative cases can lead to biased predictions through oversampling, undersampling, or synthetic data generation. Quality assessments of pancreatic imaging datasets have shown that a significant fraction of CT images are rendered unsuitable for AI due to biliary stents or other artifacts. The reproducibility of radiomics features remains largely contested across studies, with known issues in feature design, parameter setup, intra-individual test-retest repeatability, and multi-machine variability.

Real-world validation example: Sandbank et al. (2022) introduced an AI system for breast biopsies validated through a multi-site clinical trial, achieving 98.27% specificity and 98.51% sensitivity for invasive carcinoma identification. External validation on 841 slides from two sites confirmed robust cross-site performance. Implemented in Maccabi Healthcare Services for real-time quality control, it effectively flagged suspicious cases and reduced misdiagnosis rates, demonstrating how rigorous multi-site validation can translate AI research into clinical practice.

TL;DR: ML algorithms for pancreatic cancer include supervised (SVM, regression, Naive Bayes) and unsupervised (clustering) approaches, evaluated by AUROC, F1-Score, and PPV/NPV. Major challenges include limited datasets, class imbalance, imaging artifacts, and contested radiomics reproducibility. A multi-site breast biopsy AI system (98.5% sensitivity, 98.3% specificity) shows how rigorous validation enables real-world clinical deployment.

Ethics and Future Directions

Pages 8-10

Ethical Considerations, Accountability Gaps, and the Path Forward

Privacy and data security: Building effective AI systems requires substantial patient data, raising delicate ethical questions about balancing public interest with individual privacy rights. Because AI systems store data in computer drives, the risk of hacking and data theft is a genuine concern. Hackers can access and interfere with confidential patient data without necessarily tampering with the AI system itself, potentially causing serious harm to patients through breached confidentiality and corrupted records.

The accountability gap: When harm results from AI-assisted decisions, traditional rules governing physician liability may not apply cleanly. The error is perceived as an inherent risk of AI integration rather than an individual decision, creating a gap in how blame is attributed. Clear policies and legal frameworks must be established to address responsibility when medical errors stem from AI recommendations. The review emphasizes that both physicians relaying AI outputs and patients receiving them must have a thorough understanding of how these tools contribute to the decision-making process.

Algorithmic bias: Although AI systems are trained on large datasets, biases based on demographics such as race or gender can emerge if the training data lacks sufficient diversity. These biases may also stem from underlying assumptions by algorithm creators. Because minorities are frequently underrepresented in medical datasets, AI may struggle to perform equitably across patient populations, undermining trust and limiting clinical adoption.

Future outlook: Despite these challenges, the integration of AI holds significant promise for advancing personalized medicine in pancreatic cancer. AI-driven approaches have the potential to analyze complex datasets spanning imaging, biomarker profiles, and clinical parameters to enhance early detection and prognosis prediction. ML can evaluate heterogeneous data sources including genetic information, tumor biomarkers, and patient demographics to customize treatment regimens for individual patients. The review concludes that continued research, collaboration, and ethical stewardship are essential to harness the full potential of AI in oncology and translate promising results into real-world patient care improvements.

TL;DR: AI in pancreatic cancer faces ethical hurdles including data privacy risks, accountability gaps when AI-driven errors occur, and algorithmic biases from non-diverse training data. Despite these challenges, AI-powered integration of imaging, biomarkers, and clinical data offers a transformative path toward earlier detection, personalized treatment, and improved survival outcomes.