AI and Early Detection of Pancreatic Cancer: 2020 Review

Plain-English Explanations

Overview and Background

Pages 1-3

Why Pancreatic Cancer Desperately Needs Early Detection

Pancreatic ductal adenocarcinoma (PDAC) carries a 5-year survival rate of only 10%. This number obscures a critical stage-dependent reality: 49.6% of patients present with distant metastases (2.9% 5-year survival), 29.1% with regional lymph node involvement (13.3% survival), and only 10.8% have tumors still localized to the pancreas (39.4% survival). If that stage distribution could be reversed, with 50% localized and only 10% metastatic, overall survival would more than double without any new therapy. For the very earliest stage (node-negative tumors under 2 cm, stage IA), 5-year survival exceeds 80%. Yet only 1.8% of patients are diagnosed at stage I.

The window of opportunity: Genomic analysis of autopsy specimens by Yachida et al. estimated that roughly 10 years elapse from the first genetic mutation in a normal pancreas cell to the development of a clearly malignant PDAC cell, and another 5 years pass before metastatic capability emerges. This suggests a substantial multi-year window during which early-stage PDAC could theoretically be caught in asymptomatic patients.

The 2020 Virtual Summit: This article is a comprehensive presummit document prepared for the 2020 AI and Early Detection of Pancreatic Cancer Virtual Summit, organized by the Kenner Family Research Fund in conjunction with the American Pancreatic Association. An interdisciplinary group of 30+ experts contributed across five thematic sections: (A) Progress, Problems, and Prospects for Early Detection, (B) AI and Machine Learning, (C) AI and Pancreatic Cancer, Current Efforts, (D) Collaborative Opportunities, and (E) Moving Forward, with reflections from government, industry, and advocacy.

The article builds on earlier Kenner Family Research Fund summits held in 2014, 2015, 2016, and 2018. Those meetings produced a Strategic Map for Innovation centered on "Facilitated Strategic Collaboration" and identified four congruent priorities: leadership, organizational structure, funding and partnerships, and research operations. Despite these efforts, the 5-year survival rate for PDAC remains essentially unchanged, motivating the pivot toward AI-based approaches for risk stratification and early detection.

TL;DR: PDAC 5-year survival is 10% overall but 80%+ for stage IA (under 2 cm, node-negative), yet only 1.8% of patients are caught that early. A 10-15 year genetic evolution window exists before metastasis. This 30+ expert presummit review explores whether AI can close the early detection gap across imaging, biomarkers, EHR analysis, and risk stratification.

Genomics and Biomarkers

Pages 3-6

Genomic Landscape, Precursor Lesions, and the Biomarker Challenge

Driver mutations: Four genes dominate PDAC genomics. KRAS is mutated in roughly 90% of cases (codons 12 in ~91%, 13 in ~2%, 61 in ~7%), activating MAPK/ERK signaling as one of the earliest carcinogenic events. CDKN2A inactivation (90% of PDACs) eliminates cell cycle regulation. TP53 alterations (80%) disrupt genome integrity, and SMAD4 loss (just over 50%) removes TGF-beta signaling restraints. Chromatin modifier genes (ARID1A, KMT2C, KMT2D, KDM6A, among others) are each altered in under 10% of cases but collectively affect up to one-third of PDACs.

Precursor biology: Most PDACs arise from pancreatic intraepithelial neoplasia (PanIN), microscopic lesions now graded as low-grade or high-grade. Low-grade PanIN is found in 40-75% of adults, making it far too common to serve as a screening target. High-grade PanIN is the ideal intervention stage but is almost never detected in isolation. Macroscopic cystic precursors, including intraductal papillary mucinous neoplasms (IPMNs) and mucinous cystic neoplasms, can be identified on imaging but carry their own diagnostic uncertainties. The Fukuoka criteria for identifying potentially malignant IPMNs incorrectly send benign lesions to surgery one-third of the time (36% false-positive rate).

Biomarker limitations: CA 19-9, the only FDA-approved PDAC biomarker (first identified in 1979), has a sensitivity of only 25-50% in early-stage disease and can be elevated in benign biliary obstruction. Furthermore, 5-10% of the population lack Lewis blood group antigen genes needed to produce CA 19-9. The US Prevention and Screening Task Force explicitly recommended against general-population PDAC screening in 2019, noting the disease's low incidence (~13 per 100,000) and the unacceptable false-positive rate even with a near-perfect biomarker.

Emerging biomarker approaches: Newer strategies include protein biomarker panels that build on CA 19-9 and show improved early-stage performance, autoantibodies targeted against exosomal surface proteins, and cell-free DNA (cfDNA) approaches. Circulating tumor DNA (ctDNA) sequencing offers exceptionally high specificity but poor sensitivity in early-stage PDAC due to insufficient DNA shedding. Some investigators are combining ctDNA with protein markers to marry specificity and sensitivity. The CancerSEEK assay, for example, used logistic regression and random forest to combine mutation and protein scores for multi-cancer detection. However, the overwhelming majority of biomarker data remain at phase 1 (discovery) and phase 2 (validation in symptomatic disease) of the Pepe framework, far from clinical deployment.

TL;DR: KRAS (90%), CDKN2A (90%), TP53 (80%), and SMAD4 (50%+) are the core PDAC driver mutations. CA 19-9 sensitivity is only 25-50% in early disease. Low-grade PanIN affects 40-75% of adults, making it useless as a screening marker. Newer cfDNA and protein panel approaches show promise but remain in early validation phases.

Risk Stratification and Screening

Pages 6-9

The DEF Framework for Defining, Enriching, and Finding Early PDAC

Baseline risk: The age-adjusted incidence of PDAC in US subjects 50 years or older is 37 per 100,000 per year (0.037%). Assuming a biomarker or imaging test could identify PDAC up to 3 years before clinical diagnosis, the 3-year prevalence is 111 per 100,000 (0.11%). This low baseline makes population-wide screening impractical. The authors present a tiered risk classification. Low-risk groups (1.5-3x baseline, 0.2-0.3% 3-year risk) include long-standing diabetes, smoking, and obesity. Modest-risk groups (3-6x, 0.35-0.66%) include certain definitions of new-onset diabetes (NOD). High-risk groups (6-10x, 0.67-1.0%) include subjects with 2 first-degree relatives with PDAC and glycemically defined NOD. Very high-risk groups (25-50x, 3-4%) include NOD patients with ENDPAC scores of 3 or higher.

The DEF approach: The paper advocates a 3-step strategy: (1) Define a high-risk group, (2) Enrich it further to a very high-risk group, and (3) Find the lesion via imaging. The ENDPAC (Enriching New-onset Diabetes for Pancreatic Cancer) clinical model risk-stratifies NOD patients using age, rapidity of glucose rise, and weight change in the 12 months before NOD diagnosis. Serum biomarkers may also help define high-risk versus very high-risk groups, but their sensitivity in the prediagnostic stage fades rapidly beyond 12 months of lead time.

Imaging to find the lesion: A study by Singh et al. reconstructed the timeline of CT changes in prediagnostic PDAC. On average, pancreatic duct cutoff without a mass appeared around 12 months before diagnosis, a visible mass at 9 months, peripancreatic involvement at 6 months, and vascular involvement at 3 months. The sensitivity of CT for findings suspicious for PDAC was only 15% at 18 months before diagnosis, roughly 50% at 6 months, and 85% at 3 months. These findings highlight a critical gap: CT scans have limited ability to detect PDAC more than 12 months before clinical diagnosis, creating a role for AI-based radiomics to discern changes invisible to the human eye.

Endoscopic screening: EUS and MRI are the mainstays for surveillance of high-risk individuals, with pancreatic protocol CT avoided due to radiation concerns. In a comparative study, MRI was better for cyst detection while EUS was superior for solid lesions. In a highly selected cohort of 354 high-risk individuals with over 16 years of follow-up at one institution, 90% of screen-detected PDACs were resectable, with a 3-year survival of 85% compared to 25% for unresectable symptomatic PDAC found outside surveillance. Cost-effectiveness analysis showed MRI as the most cost-effective strategy for moderate-risk groups, while EUS became dominant for groups with greater than 20-fold relative risk.

TL;DR: Baseline 3-year PDAC risk in adults over 50 is 0.11%. The DEF approach (Define, Enrich, Find) uses ENDPAC scores to stratify NOD patients. CT sensitivity for prediagnostic PDAC is only 15% at 18 months and 50% at 6 months. In surveilled high-risk individuals, 90% of screen-detected PDACs were resectable with 85% 3-year survival versus 25% for symptomatic cases.

AI and Machine Learning Foundations

Pages 9-12

How Machine Learning and Deep Learning Apply to Cancer Risk Modeling

The paper dedicates a full section to introducing ML concepts for the clinical audience. Machine learning algorithms learn behavior from data rather than relying on hand-coded rules. The recent explosion in ML performance, particularly in speech recognition, natural language processing, and computer vision, has been driven by deep neural networks (deep learning), which learn complex hierarchical representations from large datasets. The authors emphasize that this framework extends well beyond simple classification tasks: a training set could consist of mammograms paired with 5-year cancer outcomes, enabling models to discover predictive signals that humans cannot easily describe.

Deep learning for risk assessment: The paper highlights breast cancer as a case study for what might be possible in PDAC. Traditional risk models such as Tyrer-Cuzick achieved AUCs of only 0.57-0.62, even after incorporating mammographic breast density. Deep learning models trained directly on mammograms achieved AUC of 0.78 (Yala et al.), identifying 42% of future cancer patients as high-risk compared to only 23% by Tyrer-Cuzick. Crucially, the deep learning model performed equally well (AUC 0.71) on both African American and White patients, while Tyrer-Cuzick dropped to AUC 0.45 for African American patients, illustrating the equity advantages of data-driven approaches.

Model-based deep learning: The review introduces model-based deep learning as an emerging paradigm that combines physics-based or statistical models with neural networks. Algorithm unrolling, based on the seminal work of Gregor and LeCun, connects iterative model-based algorithms to neural network architectures. These hybrid approaches offer better interpretability, require fewer parameters for equivalent performance gains, and transfer prior knowledge from traditional iterative methods. The paper also discusses the possibility of learning optimal acquisition parameters (MRI sequences, CT angles, ultrasound probe configurations), not just image interpretation.

Disease trajectories and population data: The authors highlight the Danish Blood Donor Study (initiated 2010, ~150,000 genotyped participants) and Denmark's population-wide electronic records spanning 40 years for ~10 million individuals. By tracking longitudinal disease codes, comorbidities, and transitions from healthy states through prediabetes to diabetes to pancreatic cancer, ML models can identify temporal patterns predictive of PDAC. One study showed that preadmission disease history alone could outcompete intensive care unit data obtained during the first 24 hours for predicting patient survival, underscoring the value of longitudinal clinical records for risk modeling.

TL;DR: Deep learning risk models achieved AUC 0.78 for breast cancer versus 0.62 for Tyrer-Cuzick, and performed equitably across racial groups (AUC 0.71 for both African American and White patients vs. 0.45/0.62). Model-based deep learning improves interpretability. Danish population records (10 million people, 40 years) demonstrate the power of longitudinal disease trajectory analysis for pancreatic cancer risk prediction.

Current AI Research Efforts

Pages 13-16

Active and Planned AI Projects Targeting PDAC Early Detection

Project Felix (Johns Hopkins/Lustgarten Foundation): Led by Elliot Fishman in collaboration with computer scientist Alan Yuille, this initiative applied deep learning tools to detect pancreatic tumors on abdominal CT scans when they are smaller and with greater reliability than human readers. The project involved meticulous manual segmentation of thousands of CT scans, representing the largest training/testing cohort in this domain worldwide, and produced at least 17 publications on techniques to automatically detect and characterize pancreatic lesions.

Pancreatic Cancer Collective (Stand Up To Cancer/Lustgarten): Two AI-focused teams were funded (May 2019 to April 2021). The records-based team, led by Chris Sander (Dana-Farber) and Regina Barzilay (MIT), assembled cohorts of over 4 million patient records at 3 sites, implemented a common data model for site-agnostic analysis, developed AI models to identify intermediate phenotypes from medical records and images, and integrated structured clinical data with imaging into individual risk scores. The genomics and immune factor team, led by Raul Rabadan (Columbia) and Nuria Malats (CNIO Madrid), combined large multinational genomic datasets with clinical and tumor microenvironmental factors for integrated PDAC risk estimation.

Blood-based initiatives: The CancerSEEK assay (Vogelstein and Tomasetti) applied logistic regression for combining mutation and protein scores and random forest for tissue localization. Mayo Clinic (Petersen and Majumdar) explored ML for molecular and imaging biomarker discovery from a large database of pancreatic cancer patients. Memorial Sloan Kettering, in collaboration with Weill Cornell, Weizmann Institute, and Cold Spring Harbor Lab, analyzed over 1,400 individual exosome proteins from plasma, combined with serum proteomic spectra, ctDNA, and CA 19-9, all annotated by patient characteristics. Dana-Farber (Wolpin) led a multicenter U01-funded project for blood-based biomarker development.

Additional projects: Eugene Koay at MD Anderson characterized CT subtypes of PDAC, showing that conspicuous (high-delta) tumors have more aggressive biology, higher growth rates, and shorter initiation times. Kaiser Permanente Southern California used natural language processing on radiology reports to identify at-risk individuals. Gregory Poore and Robert Knight reanalyzed TCGA data for microbial signatures using ML to discriminate among cancer types. The NCI-sponsored Alliance of Pancreatic Cancer Consortia coordinated four consortia, including the Pancreatic Cancer Detection Consortium, the Chronic Pancreatitis/Diabetes/Pancreatic Cancer consortium, the Early Detection Research Network, and the Molecular and Cellular Characterization of Screen-Detected Lesions program.

TL;DR: Project Felix produced 17+ publications using deep learning on thousands of segmented CT scans. The Pancreatic Cancer Collective assembled 4+ million patient records across 3 sites. CancerSEEK used logistic regression and random forest for multi-cancer detection. MSK analyzed 1,400+ exosome proteins combined with proteomic, ctDNA, and clinical data. Most projects used established ML rather than true deep learning.

Data Infrastructure and Collaboration

Pages 16-18

Data Accessibility, Federated Learning, and Organizational Strategy

Centralized vs. federated approaches: Two competing strategies exist for AI training data. Centralization (exemplified by the NCI-EDRN prediagnosis imaging repository at MD Anderson) simplifies model training but faces privacy, institutional sharing, and maintenance barriers. Federated learning retains data at local sites, distributes computation locally, and returns model parameters to a central system. The Stand Up To Cancer-funded medical records team adopted a federated model with the Observational Medical Outcomes Partnership (OMOP) Common Data Model. However, no peer-reviewed federated learning system in the PDAC domain had been demonstrated at the time of this review.

Key databases: Available resources include the NCI-EDRN centralized imaging repository, the Pancreatic Cancer Collective's 1.5-million-person cohort from general hospital populations, the UK Biobank (500,000 volunteers with diverse health data), Danish National Medical Records (population-wide data spanning decades), and multiple blood-based biomarker registries at MSK, Mayo Clinic, NYU, and Dana-Farber. Google's TensorFlow Federated and NVIDIA Clara were identified as platforms for federated learning implementation.

Standardization needs: The authors emphasize the critical importance of uniform standard operating procedures (SOPs) for collecting biological materials, demographic and clinical data, and imaging. The OMOP Common Data Model and the Digital Imaging and Communications in Medicine (DICOM) standard provide existing frameworks for data interoperability. The NCI-EDRN project adopted a standardized data dictionary (developed at Dana-Farber) implemented via RedCAP. Without these standards, cross-site model training and validation remain severely limited.

Proposed organizational structure: The paper proposes an "Early Detection Strategy" organizational framework with Collaborative Groups organized around specific technologies (imaging, liquid biopsy, genomics), a centralized Data Management and Analysis Group using AI/ML and natural language processing, an Executive Committee incorporating regulatory, ethical, and patient advocacy perspectives, and a Director with support staff. The charter would include milestones for progressively decreasing PDAC mortality and methods for prioritizing high-yield research areas while removing underperforming ones.

TL;DR: No federated learning system for PDAC existed at the time of review. Key resources include the Pancreatic Cancer Collective's 1.5 million-person cohort, UK Biobank (500,000), and Danish national records. The OMOP Common Data Model and DICOM standard are essential for cross-site interoperability. The authors propose a formal organizational structure with collaborative groups, centralized data management, and an executive committee.

Perspectives from Government, Industry, and Advocacy

Pages 19-22

Translating AI From Research Promise to Clinical Reality

The AI Chasm: The paper details the wide gap between promising research AUCs and FDA-cleared AI medical devices. By 2019, only 11 AI technologies for imaging interpretation had been cleared by the FDA. The "AI Chasm" refers to the fact that accuracy demonstrated in research does not necessarily translate to clinical utility. Edge cases with large errors can have catastrophic consequences, and algorithm bias has been documented to lower accuracy in underrepresented groups. The "black box" nature of deep learning contributes to clinician hesitancy, compounded by a paucity of prospective peer-reviewed studies and evolving regulatory frameworks for AI products.

Government perspective (NIDDK): The NIDDK highlights diabetes as both a risk factor for PDAC (2-fold increased incidence in long-standing T2DM) and a consequence of it (over 50% of PDAC patients have diabetes at diagnosis). The distinction between type 2 (T2DM) and pancreatogenic type 3c diabetes (T3cDM) is a current research focus. The DETECT study enrolled 452 subjects to evaluate biomarkers distinguishing T3cDM from T2DM. Ronald Summers and colleagues at the NIH Clinical Center explored AI applications in pancreatic imaging to enhance detection of early-stage PDAC, addressing the significant rate of failure to detect early lesions on CT.

Industry perspective: Graham Lidgard from Exact Sciences notes that the US spent over $4 trillion on healthcare in 2020, with approximately $200 billion on cancer care. Historically, small numbers of analyte targets and limited data access made AI/ML unnecessary. But with second- and third-generation molecular technologies producing terabytes of data (whole genome, exome, targeted methylation), AI/ML becomes essential. Companies like Grail, Thrive, and Exact Sciences are developing multi-cancer early detection approaches using liquid biopsies. AI/ML algorithms search gigabytes of sequencing data across patients to identify features correlating with disease at high specificity. The current limitation remains early-stage sensitivity, which improves as additional marker data types are integrated.

Patient advocacy: The National Pancreas Foundation, Pancreatic Cancer Action Network, Lustgarten Foundation, and other groups play critical roles in funding early detection research, raising awareness among high-risk populations, encouraging genetic testing, and assisting with clinical trial enrollment. The World Pancreatic Cancer Coalition, founded in 2016, consists of more than 90 advocacy groups from over 30 countries. Advocacy groups also address the persistently low rate of clinical trial enrollment among pancreatic cancer patients through grassroots programs and financial support for underserved communities.

TL;DR: Only 11 AI imaging tools had FDA clearance by 2019. Over 50% of PDAC patients have diabetes at diagnosis, with 1-2% of new-onset diabetes in those over 50 linked to PDAC. Multi-cancer liquid biopsy companies (Grail, Thrive, Exact Sciences) use AI/ML on terabytes of sequencing data. The World Pancreatic Cancer Coalition spans 90+ advocacy groups across 30+ countries.

Limitations and Future Directions

Pages 22-23

Key Gaps, Challenges, and the Path Forward

Research gaps identified: Most AI efforts in PDAC early detection used established epidemiological or traditional ML techniques (logistic regression, random forest, SVM) rather than true deep learning. Integration of disparate data sources, including imaging, genetics, omics, patient characteristics, and microbiome data, remained limited. No ongoing PDAC-specific microbiome early detection research was identified. Natural language processing, time series analysis, and integrative risk modeling were poorly represented in the active PDAC literature. The absence of a public or semi-public dataset for PDAC risk was identified as a significant barrier to recruiting non-medical AI researchers to this field.

Data accessibility challenges: Few centralized, anonymized datasets existed for semi-public access by PDAC researchers. The NCI-EDRN effort was the only major public resource. Health Insurance Portability and Accountability Act (HIPAA) requirements, institutional review board restrictions, competitive concerns among data holders, and international data sovereignty issues all impede the sharing of large volumes of detailed patient data needed for AI training. Data format standardization remains a significant obstacle, with multiple institutions using incompatible collection and encoding procedures.

Equity and generalizability: The paper repeatedly flags the risk of AI model bias. Distribution shift, where models trained on relatively homogeneous populations fail to generalize to diverse patients or clinical environments, is a core concern. Traditional risk models (e.g., Tyrer-Cuzick for breast cancer) were developed on predominantly White populations and have known limitations for other racial groups. The authors argue that testing for bias and measuring model performance across diverse demographic groups should be a required evaluation standard for all published risk models and clinical implementations.

The path forward: The authors call for a "Framingham Heart Study" equivalent for cancer, a multigenerational, longitudinal study that collects comprehensive clinical, genomic, lifestyle, and imaging data. They advocate for strategic multidisciplinary collaboration among AI researchers, cancer biologists, clinicians, epidemiologists, and patient advocacy groups, supported by committed funders. Specific near-term priorities include establishing a centralized web-based registry of all planned and ongoing AI-in-PDAC projects, developing uniform SOPs across all data types, creating public training and testing datasets spanning imaging, genomics, proteomics, and metabolomics, and demonstrating federated learning systems that allow multi-institutional AI model development without centralizing sensitive patient data.

TL;DR: Most PDAC AI work used traditional ML, not deep learning. No public PDAC risk dataset, no demonstrated federated learning system, and no microbiome-specific early detection research existed. Key priorities include standardized SOPs, public datasets, federated learning infrastructure, equity-focused model evaluation, and a Framingham-style longitudinal cancer study. Strategic multidisciplinary collaboration is essential.

Artificial Intelligence and Early Detection of Pancreatic Cancer: 2020 Summative Review

Original Paper (PDF)