Progress and challenges of artificial intelligence in lung cancer clinical translation

npj Precision Oncology 2024 AI 9 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
AI Is Entering Every Stage of Lung Cancer Care, and This Review Maps the Landscape

Lung cancer remains the leading cause of cancer-related deaths worldwide, responsible for an estimated 1.8 million deaths per year. Despite major advances in targeted therapy and immunotherapy, the sheer volume of clinical and research data generated for each patient remains insufficiently integrated and analyzed. This narrative review, authored by a team from MD Anderson Cancer Center and partner institutions, examines the translational potential of AI across the entire lung cancer care pathway: prevention, screening, diagnosis, prognosis, treatment, and monitoring.

The authors framed the review from the perspective of oncologists rather than data scientists, focusing on what AI can functionally do today and what it may enable in the near future. They searched MEDLINE, PubMed, and citation lists using "lung cancer" combined with keywords such as "Artificial Intelligence," "Machine Learning," "Deep Learning," "Radiomics," and "Large Language Model." The search covered English-language publications from January 1, 2014, to December 21, 2024. Rather than a systematic review or meta-analysis, this is a narrative synthesis intended to highlight urgent clinical challenges where AI has the most potential for impact.

AI evolution in oncology: The paper traces how deep learning moved from theory to practice. In the 2010s, CNNs set benchmarks in dermoscopic skin cancer diagnosis. In the 2020s, transformers (originally designed for natural language processing) enhanced medical image analysis when integrated with CNNs. Today, multimodal AI systems can combine radiology, pathology, and genomic data, enabling so-called "generalist medical AI" with multitasking capabilities. Architectures like ResNet, U-Net, and YOLO are now standard for feature extraction from imaging data, enabling "virtual biopsy" by extracting pathological and genomic information from routine scans.

Clinical pressures driving adoption: The rapidly increasing volume of imaging scans has created significant radiologist burnout and reduced interpretive accuracy. Digital pathology platforms now allow real-time AI deployment on whole-slide imaging. Large language models (LLMs) are being tested for summarizing radiology and pathology reports and for providing personalized treatment recommendations as medical chatbots. The review summarizes representative studies in Table 1, listing tasks, cohort sizes, data modalities, algorithms, and performance metrics across more than 25 studies.

TL;DR: This narrative review from MD Anderson covers AI across the full lung cancer care pathway, drawing on literature from 2014 to 2024. Lung cancer kills 1.8 million people per year, and AI technologies including CNNs, transformers, and multimodal architectures are being applied to prevention, screening, diagnosis, prognosis, treatment, and monitoring.
Pages 2-3
AI for Smoking Cessation, Risk Prediction, and Personalized LDCT Screening

Prevention: Tobacco remains the primary cause of lung cancer, with the global population of cigarette smokers still close to 1 billion. AI is being applied in two ways. First, it can analyze images of a smoker's daily environment to identify contexts associated with smoking cravings and predict relapse risk. Second, wearable sensors combined with a CNN-LSTM neural network can monitor smoking behavior by studying puff topography signals. However, smoking prevalence is influenced by socioeconomic status, mental health conditions, gender, sexuality, and ethnicity, meaning individual-level interventions are insufficient without broader public health AI strategies.

Screening criteria limitations: The U.S. Preventive Services Task Force recommends annual low-dose CT (LDCT) for people aged 50 to 80 with a 20 pack-year smoking history. But this criterion is imprecise. Some evidence suggests that 20-year smoking duration is a better predictor than 20 pack-years. Risk is also influenced by ethnicity, genetics, and environmental exposures such as PM2.5, factors that traditional linear regression models struggle to integrate. AI tools have been tested for identifying high-risk individuals using routine clinical data, chest X-rays, web search histories, and survey responses.

Deep learning for nodule detection: A Google-developed deep learning algorithm analyzing current and prior CT scans achieved 94.4% AUC on 6,716 National Lung Cancer Screening Trial (NLST) cases and outperformed six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Sybil, another deep learning model, predicted future lung cancer risk from a single LDCT with AUC of 0.92 at 1 year and 0.75 at 6 years on NLST data. A meta-analysis of AI-based LDCT screening tools showed high sensitivity (94.6%) but moderate specificity (93.6%), translating to false-positive rates of approximately 6.4% and false-negative rates of approximately 5.4%.

Beyond lung nodules: LDCT screening creates opportunities for AI to simultaneously detect other smoking-related diseases, including chronic obstructive lung disease and cardiovascular disease. Deep learning has also shown superior performance compared to radiologists in detecting lung nodules on X-rays. Blood-based biomarkers including ctDNA and plasma protein markers offer additional avenues for early-stage detection. Ultra-low-dose CT enabled by deep learning image reconstruction can further reduce radiation exposure during screening.

TL;DR: Google's deep learning model achieved 94.4% AUC on 6,716 NLST cases, reducing false positives by 11% and false negatives by 5% vs. radiologists. Sybil predicted lung cancer risk from a single LDCT with AUC 0.92 at 1 year. Meta-analysis found AI screening sensitivity of 94.6% and specificity of 93.6%. Current USPSTF criteria miss high-risk non-smokers, and AI can integrate genetics, environmental exposures, and routine data to improve risk stratification.
Pages 3-4
Radiomics Turns Routine CT Scans into Virtual Biopsies

Lung cancer is a heterogeneous disease with diverse clinicopathological characteristics, and AI is improving diagnosis in three domains: radiomics, digital pathology, and genomic sequencing. Radiomics combined with deep learning allows clinicians to derive comprehensive pathological insights from routine radiology scans before final pathological confirmation. This has been demonstrated across multiple tasks: differentiating lung cancer from benign lesions, distinguishing primary from metastatic lung lesions, identifying malignant vs. benign pleural effusions, classifying adenocarcinoma vs. squamous cell carcinoma, and even distinguishing subtypes of adenocarcinoma.

Driver mutation prediction: Radiomics models can predict key driver mutations directly from CT images, including EGFR exon 19 deletions (19Del), L858R (exon 21), T790M (exon 20), and ALK rearrangement. For EGFR mutation prediction, models achieved AUC values ranging from 0.67 to 0.85 depending on the specific mutation and cohort. ALK rearrangement prediction reached AUC 0.880. These capabilities offer a non-invasive complement to tissue-based molecular testing, which is particularly valuable when biopsy tissue is insufficient or inaccessible.

Immune biomarker prediction: Radiomic features can also predict expression of PD-L1 (AUC 0.70 to 0.72) and CD8+ T cell density from CT scans, potentially guiding immunotherapy decisions without requiring tissue. Performance metrics for representative studies included AUC 0.842 for differentiating primary vs. metastatic lesions using PET/CT, AUC 0.82 for malignant pleural effusion diagnosis from endoscopic images, and accuracy of 78.8% for NSCLC subtype classification from whole-slide images.

The algorithms used across these studies span the full range of machine learning approaches: CNNs for image-based classification, random forest and support-vector machine models for handcrafted radiomic features, K-means clustering for unsupervised pattern discovery, and elastic net regression for high-dimensional feature selection. The diversity of approaches reflects the fact that no single architecture dominates across all tasks, and hybrid pipelines combining radiomics with deep learning often outperform either alone.

TL;DR: Radiomics predicts EGFR mutations (AUC 0.67-0.85), ALK rearrangement (AUC 0.880), PD-L1 expression (AUC 0.70-0.72), and CD8+ T cell density from CT scans alone. It also differentiates primary vs. metastatic lesions (AUC 0.842), diagnoses malignant pleural effusions (AUC 0.82), and classifies NSCLC subtypes (78.8% accuracy), enabling "virtual biopsy" from routine imaging.
Page 4
AI in Computational Pathology and Genomic Sequencing for Lung Cancer

Digital pathology: Despite early barriers from the high costs of digitalization, AI-powered computational pathology has gained significant momentum. AI now enables automatic lung cancer diagnosis across multiple specimen types, including H&E-stained slides, cryosection tissue slides, cytopathology samples, and lymph node biopsies. The Lunit SCOPE IO analytical tool demonstrated strong consistency between AI and pathologists for tumor-infiltrating lymphocyte (TIL) assessment, with correlation coefficients of R = 0.9429 to 0.9458. From H&E slides alone, AI can predict driver mutations, PD-L1 expression (AUC 0.63), and TIL density, reducing the need for separate immunohistochemistry or molecular panels.

Genomic sequencing enhancements: Advances in predictive biomarker discovery have paved the way for targeted therapies and immunotherapies. AI enhances somatic mutation identification in next-generation sequencing, outperforming standard genetic analysis approaches. By decoding genomic and transcriptomic data, AI can accurately determine the cell-of-origin for cancers of unknown primary, aiding in both diagnosis and treatment planning. In the context of immune biomarkers, AI can predict tumor mutation burden, neoantigens, and T-cell receptor-antigen binding specificity, all of which are relevant for immunotherapy patient selection.

Prognosis and staging: Lung cancer staging uses the tumor-nodal-metastasis (TNM) classification, sometimes requiring invasive procedures such as endobronchial ultrasound biopsy. AI integrates multi-modal data (medical records, radiology, pathology, and molecular data) to enhance staging accuracy and risk stratification. Pilot studies using routine radiology scans have predicted adenocarcinoma invasiveness, distant metastasis, and novel imaging subtypes. For radiology-based prognosis, AI predictions were significantly associated with overall survival at AUC 0.70 to 0.71, outperforming clinical feature predictions at AUC 0.58 to 0.66. For pathology-based prognosis, AI achieved AUC 0.64 to 0.85, compared to clinical feature predictions at AUC 0.52 to 0.84.

TL;DR: AI-pathologist correlation for TIL assessment reached R = 0.9429-0.9458. AI predicted PD-L1 from H&E slides (AUC 0.63) and driver mutations from whole-slide images (AUC 0.9715 for PD-L1 expression via radiomics). Radiology-based AI prognosis achieved AUC 0.70-0.71 vs. 0.58-0.66 for clinical features. Pathology-based AI prognosis reached AUC 0.64-0.85 vs. 0.52-0.84 for clinical features alone.
Pages 4-5
AI Guides Surgical Decisions, Radiotherapy Planning, and Complication Prediction

Surgical planning: For peripheral, node-negative NSCLC measuring 2 cm or smaller, sublobar resection is not inferior to lobectomy. However, lymph node-negative status can only be definitively confirmed after surgery. A deep learning model was developed to predict lymph node metastasis preoperatively (AUC 0.82), helping surgeons identify candidates suitable for sublobar resection before entering the operating room. Pulmonary function tests (PFTs), crucial for assessing surgical candidacy, can also be interpreted through AI collaboration with pulmonologists, with one study showing accuracy improvements of 10.4%.

Virtual reality and intraoperative AI: For patients undergoing segmentectomy, the variability and complexity of intrathoracic anatomy present significant challenges. Virtual reality systems have been developed to reconstruct thoracic anatomy, aiding in preoperative surgery planning and potentially reducing the duration of complex surgeries. During surgery, AI can detect air-leak sites by analyzing surgical videos, even in deflated lungs, with sensitivity and specificity of 81.3% and 68.9% respectively. This capability helps surgeons address potential complications before closing thoracic cavities.

Radiotherapy: Radiotherapy remains a critical therapeutic approach for locally advanced lung cancer, where it still holds curative potential. Accurate delineation of gross tumor volume and consistent contouring of organs at risk are essential but challenging. AI-based algorithms have been tested for auto-contouring and radiotherapy planning, which is especially useful for low- and middle-income countries with limited specialist availability. Radiomic models have been used to predict lung cancer recurrence, cardiotoxicity (sensitivity and specificity of 86% and 77.8% for 3-month overall survival), and lung toxicity after radiotherapy.

FDA-approved devices: The review catalogs 12 FDA-approved AI devices for lung cancer, all imaging-based. These include Riverain ClearRead CT, Siemens AI-Rad Companion, Coreline AView LCS, and Manteia MOZI TPS for radiotherapy planning. Most are Class II (moderate risk) devices approved between 2016 and 2023. Their tasks range from solid pulmonary nodule detection and Lung-RADS categorization to automated reporting and segmentation of lung, liver, and lymph node lesions.

TL;DR: AI predicts lymph node metastasis preoperatively (AUC 0.82), detects intraoperative air leaks (81.3% sensitivity, 68.9% specificity), and predicts post-radiotherapy cardiotoxicity (86% sensitivity, 77.8% specificity). Twelve FDA-approved AI devices for lung cancer exist as of 2023, all imaging-based, covering nodule detection, Lung-RADS categorization, and radiotherapy planning.
Page 5
AI Predicts Immunotherapy Response, EGFR-TKI Resistance, and Treatment Selection

Immunotherapy biomarkers: NSCLC was historically considered poorly immunogenic, but advances have identified two key immune checkpoints: CTLA-4 and the PD-1/PD-L1 axis. Anti-PD-1 and anti-PD-L1 antibodies significantly improve survival compared to chemotherapy. PD-L1 expression is the primary biomarker for predicting response to immune checkpoint inhibitors, but responses also occur in patients without detectable PD-L1. This paradox is attributed to intratumoral and intertumoral heterogeneity of PD-L1 expression, which introduces inherent biopsy sampling bias. AI can address this by predicting PD-L1 expression non-invasively from imaging data across the entire tumor volume.

Radiomic and deep learning biomarkers: Beyond traditional immune markers, radiomic biomarkers have provided early survival indicators in immunotherapy patients. Deep learning models effectively capture imaging patterns beyond known handcrafted features, enhancing predictive accuracy with AUC 0.9 for text report-based models. AI can also predict adverse reactions to immunotherapy, including hyperprogression, cachexia, and immunotherapy-induced pneumonitis. Blood biomarkers such as ctDNA and cytokines further feed AI models for predicting immunotherapy responses, and multimodal integration of radiomics, pathomics, and genomics holds promise for identifying the best immunotherapy candidates.

EGFR-targeted therapy: EGFR mutations are the most commonly targetable driver mutations in lung adenocarcinoma. Third-generation EGFR-TKIs have significantly extended survival, but treatment resistance remains challenging. Combination strategies with chemotherapy or VEGF inhibitors improve response durability but increase severe adverse events. Two AI studies demonstrated the ability to predict progression risk to identify high-risk patients who would most benefit from combination therapy, personalizing treatment intensity based on predicted resistance patterns.

Clinical decision support: Watson for Oncology (WFO) was explored for lung cancer decision-making. Preliminary results indicated AI's potential in adhering to clinical guidelines, but a relatively high proportion of cases were still not supported by WFO, and the system needed to learn regional patient characteristics. More broadly, AI-powered clinical decision support systems that integrate radiology, pathology, genomics, and clinical data can provide physicians with personalized treatment information, though full clinical integration remains in early stages.

TL;DR: AI addresses PD-L1 heterogeneity by predicting expression non-invasively from imaging. Deep learning achieved AUC 0.9 for predicting immunotherapy response from text reports and can predict adverse reactions (hyperprogression, pneumonitis). AI also identifies EGFR-TKI resistance risk to guide combination therapy decisions. Watson for Oncology showed guideline adherence but left many cases unsupported.
Pages 5-6
AI Automates Response Evaluation and LLMs Enter Lung Cancer Decision-Making

Response monitoring: Treatment response in lung cancer primarily relies on RECIST (Response Evaluation Criteria in Solid Tumors), which measures lesion size changes. However, RECIST has been questioned in the context of targeted therapies and immunotherapies due to phenomena such as pseudoprogression. Noninvasive radiomic biomarkers can distinguish pseudoprogression from hyperprogression with AUC of 0.88, and hyperprogression from true progression with AUC of 0.87. Response assessment is also time-intensive and subject to high intra- and inter-reader variability, and deep learning has shown promise in automating RECIST evaluations for immunotherapy patients.

Minimal residual disease (MRD): Circulating tumor DNA (ctDNA) monitoring in plasma has emerged as a valuable method for detecting MRD and predicting patient survival. Longitudinal ctDNA detection offers insights into treatment response and can guide therapeutic strategies for patients with metastatic NSCLC. Machine learning approaches have shown promise in analyzing ctDNA kinetics, enabling the optimization of personalized therapies. One study achieved AUROC of 0.682 for identifying patients with increased risk of relapse using combined CT and ctDNA data.

Large language models: LLMs can respond to free-text queries without requiring specific task training, enabling rapid comprehension of medical domain knowledge. Medical chatbots have demonstrated the capability to generate responses comparable to clinicians in both quality and empathy. For lung cancer, LLMs may serve as decision aids and clinical trial matching tools. However, inaccuracy remains the most concerning problem, as LLMs can fabricate facts by learning statistical word associations rather than achieving true understanding. Their training data often comes from unverified internet sources. They function best as assistive tools under human supervision rather than in autonomous roles.

Clinical trial matching: AI facilitates matching patient medical records against enrollment criteria. Multiple studies have reported that AI can effectively extract patient data and match it to relevant clinical trials, potentially accelerating enrollment for lung cancer patients who might otherwise miss trial opportunities. This is particularly important given the complexity of modern lung cancer trials, which often require specific molecular profiles and prior treatment histories.

TL;DR: Radiomic biomarkers distinguish pseudoprogression from hyperprogression (AUC 0.88) and hyperprogression from true progression (AUC 0.87). ctDNA-based MRD monitoring with AI achieved AUROC 0.682 for relapse risk. LLMs show potential as clinical decision aids and trial-matching tools but remain prone to fabrication and require human oversight.
Pages 6-7
Data Sharing, Bias, Interpretability, and Reproducibility Block Clinical Translation

Data sharing: Continuous data supply is crucial for training, validating, and refining AI algorithms, but sharing data across institutions is hampered by patient privacy concerns and intellectual property protections. The paper outlines three solutions. Centralized learning pools data under a shared legal agreement, which is effective but costly. De-identified public datasets (the review catalogs 8 publicly available lung cancer datasets, including NLST, LIDC-IDRI, TCGA-LUAD, NSCLC-Radiomics, NSCLC Radiogenomics, AutoPET, RIDER Lung CT, and MIDRC) are more affordable but often lack specific patient information. Federated learning keeps data private at each institution while training models in a distributed manner, and it has been implemented across breast, brain, gastric, melanoma, and lung cancer applications.

Bias and fairness: AI models inherit biases from training data that favor particular racial, ethnic, or gender groups. Only 50% of Black women and 63% of Black men diagnosed with lung cancer qualified for screening under current criteria. Among 75,774 patients in The Society of Thoracic Surgeons database, white patients and those with private insurance had higher incidence of complex operations, suggesting systematic access disparities that would be replicated by AI trained on this data. Efforts are underway to generate more diverse datasets for both breast and lung cancer and to design algorithms that explicitly ensure fairness across demographic groups.

Interpretability: Deep learning approaches operate as end-to-end "black box" systems, mapping inputs directly to outcomes without manually selected features. This makes it difficult to understand which factors drive decisions, potentially leading to misleading conclusions from spurious confounders. Such opacity is often deemed unacceptable in healthcare. Explainable AI (XAI) is an active research area, but even FDA-approved AI devices currently offer limited interpretability. The optimal form of clinical explainability remains unknown.

Reproducibility: Most published AI studies lack reproducibility. Imaging protocols (CT scanner manufacturer, radiation dose, convolution kernel, iterative reconstruction, section thickness) significantly impact diagnostic performance. Motion artifacts and image noise further degrade quality. Annotation variability among radiologists introduces subjectivity. The Image Biomarker Standardization Initiative (IBSI) has made progress, and a 16-criteria checklist guides radiomic test development. Multiple reporting guidelines now exist: MINIMAR, SPIRIT-AI, CONSORT-AI, and ESMO-GROW.

TL;DR: Only 50% of Black women and 63% of Black men with lung cancer qualified for screening, illustrating bias. Eight publicly available lung cancer datasets exist but each has limitations. Federated learning offers a privacy-preserving alternative to centralized data sharing. Reporting standards (MINIMAR, SPIRIT-AI, CONSORT-AI, ESMO-GROW) and the IBSI's 16-criteria checklist aim to improve reproducibility.
Pages 7-8
Generalist AI, Wearable Sensors, and the Path to Multimodal Clinical Integration

Generalist AI: Currently, most AI models in healthcare are uni-modal and uni-task, requiring separate models for different data types (medical records, radiology, pathology, genomics) to solve even a single clinical question. The ultimate objective is generalist AI capable of analyzing multi-modal data and addressing a wide range of tasks. Novel deep learning architectures can integrate multiple modalities to improve performance. PathChat, a chatbot enabling interactive discussions with pathologists, represents an early step, providing expert-level insights on specific cases. Extending this concept, generalist models could integrate comprehensive patient information and interact with physicians the way ChatGPT functions, allowing physicians to define prediction tasks in natural language and receive explanations alongside predictions.

Wearable and environmental data: Beyond traditional medical data like radiologic images and genomic information, which are costly and not time-sensitive, smartphones and wearable sensors can collect extensive physiological and environmental data. AI could manage these large datasets to identify individuals at high risk for lung cancer, which is heavily influenced by environmental and behavioral factors. Real-time AI-assisted lung cancer prevention could offer personalized early intervention and risk management strategies while simultaneously accumulating data for researchers to identify underlying risk factors.

Remote monitoring: Integrating personal data from wearables can facilitate remote patient monitoring, providing alerts to primary physicians and patients during the diagnosis and treatment course of lung cancer. This is particularly relevant for treatment response tracking, where continuous data streams could complement periodic imaging assessments. The authors emphasize that multi-party collaboration is needed to optimize regulatory frameworks, improve AI development and validation standards, and strengthen full lifecycle management and post-market surveillance.

Regulatory evolution: The pace of AI development challenges existing regulatory frameworks and requires expanded staffing to process submissions efficiently. The approval process involves more stringent clinical trials and validation testing than what typically appears in academic publications. Most approved AI products perform well on predefined tasks like detection but lack generalizability across different patient populations. Multi-party collaboration between regulators, developers, clinicians, and institutions is needed to address these challenges while keeping pace with rapid technological advancement.

TL;DR: The field is moving from uni-modal, uni-task AI toward generalist multimodal models that integrate radiology, pathology, genomics, and clinical data in a single system. Wearable sensors and smartphones could enable real-time, AI-assisted lung cancer prevention and remote monitoring. Regulatory frameworks need multi-party collaboration to keep pace with the rapid development cycle.
Citation: Zhu E, Muneer A, Zhang J, et al.. Open Access, 2025. Available at: PMC12214742. DOI: 10.1038/s41698-025-00986-7. License: cc by-nc-nd.