Lung cancer remains the leading cause of cancer-related deaths worldwide, responsible for an estimated 1.8 million deaths per year. Despite major advances in targeted therapy and immunotherapy, the sheer volume of clinical and research data generated for each patient remains insufficiently integrated and analyzed. This narrative review, authored by a team from MD Anderson Cancer Center and partner institutions, examines the translational potential of AI across the entire lung cancer care pathway: prevention, screening, diagnosis, prognosis, treatment, and monitoring.
The authors framed the review from the perspective of oncologists rather than data scientists, focusing on what AI can functionally do today and what it may enable in the near future. They searched MEDLINE, PubMed, and citation lists using "lung cancer" combined with keywords such as "Artificial Intelligence," "Machine Learning," "Deep Learning," "Radiomics," and "Large Language Model." The search covered English-language publications from January 1, 2014, to December 21, 2024. Rather than a systematic review or meta-analysis, this is a narrative synthesis intended to highlight urgent clinical challenges where AI has the most potential for impact.
AI evolution in oncology: The paper traces how deep learning moved from theory to practice. In the 2010s, CNNs set benchmarks in dermoscopic skin cancer diagnosis. In the 2020s, transformers (originally designed for natural language processing) enhanced medical image analysis when integrated with CNNs. Today, multimodal AI systems can combine radiology, pathology, and genomic data, enabling so-called "generalist medical AI" with multitasking capabilities. Architectures like ResNet, U-Net, and YOLO are now standard for feature extraction from imaging data, enabling "virtual biopsy" by extracting pathological and genomic information from routine scans.
Clinical pressures driving adoption: The rapidly increasing volume of imaging scans has created significant radiologist burnout and reduced interpretive accuracy. Digital pathology platforms now allow real-time AI deployment on whole-slide imaging. Large language models (LLMs) are being tested for summarizing radiology and pathology reports and for providing personalized treatment recommendations as medical chatbots. The review summarizes representative studies in Table 1, listing tasks, cohort sizes, data modalities, algorithms, and performance metrics across more than 25 studies.
Prevention: Tobacco remains the primary cause of lung cancer, with the global population of cigarette smokers still close to 1 billion. AI is being applied in two ways. First, it can analyze images of a smoker's daily environment to identify contexts associated with smoking cravings and predict relapse risk. Second, wearable sensors combined with a CNN-LSTM neural network can monitor smoking behavior by studying puff topography signals. However, smoking prevalence is influenced by socioeconomic status, mental health conditions, gender, sexuality, and ethnicity, meaning individual-level interventions are insufficient without broader public health AI strategies.
Screening criteria limitations: The U.S. Preventive Services Task Force recommends annual low-dose CT (LDCT) for people aged 50 to 80 with a 20 pack-year smoking history. But this criterion is imprecise. Some evidence suggests that 20-year smoking duration is a better predictor than 20 pack-years. Risk is also influenced by ethnicity, genetics, and environmental exposures such as PM2.5, factors that traditional linear regression models struggle to integrate. AI tools have been tested for identifying high-risk individuals using routine clinical data, chest X-rays, web search histories, and survey responses.
Deep learning for nodule detection: A Google-developed deep learning algorithm analyzing current and prior CT scans achieved 94.4% AUC on 6,716 National Lung Cancer Screening Trial (NLST) cases and outperformed six radiologists with absolute reductions of 11% in false positives and 5% in false negatives. Sybil, another deep learning model, predicted future lung cancer risk from a single LDCT with AUC of 0.92 at 1 year and 0.75 at 6 years on NLST data. A meta-analysis of AI-based LDCT screening tools showed high sensitivity (94.6%) but moderate specificity (93.6%), translating to false-positive rates of approximately 6.4% and false-negative rates of approximately 5.4%.
Beyond lung nodules: LDCT screening creates opportunities for AI to simultaneously detect other smoking-related diseases, including chronic obstructive lung disease and cardiovascular disease. Deep learning has also shown superior performance compared to radiologists in detecting lung nodules on X-rays. Blood-based biomarkers including ctDNA and plasma protein markers offer additional avenues for early-stage detection. Ultra-low-dose CT enabled by deep learning image reconstruction can further reduce radiation exposure during screening.
Lung cancer is a heterogeneous disease with diverse clinicopathological characteristics, and AI is improving diagnosis in three domains: radiomics, digital pathology, and genomic sequencing. Radiomics combined with deep learning allows clinicians to derive comprehensive pathological insights from routine radiology scans before final pathological confirmation. This has been demonstrated across multiple tasks: differentiating lung cancer from benign lesions, distinguishing primary from metastatic lung lesions, identifying malignant vs. benign pleural effusions, classifying adenocarcinoma vs. squamous cell carcinoma, and even distinguishing subtypes of adenocarcinoma.
Driver mutation prediction: Radiomics models can predict key driver mutations directly from CT images, including EGFR exon 19 deletions (19Del), L858R (exon 21), T790M (exon 20), and ALK rearrangement. For EGFR mutation prediction, models achieved AUC values ranging from 0.67 to 0.85 depending on the specific mutation and cohort. ALK rearrangement prediction reached AUC 0.880. These capabilities offer a non-invasive complement to tissue-based molecular testing, which is particularly valuable when biopsy tissue is insufficient or inaccessible.
Immune biomarker prediction: Radiomic features can also predict expression of PD-L1 (AUC 0.70 to 0.72) and CD8+ T cell density from CT scans, potentially guiding immunotherapy decisions without requiring tissue. Performance metrics for representative studies included AUC 0.842 for differentiating primary vs. metastatic lesions using PET/CT, AUC 0.82 for malignant pleural effusion diagnosis from endoscopic images, and accuracy of 78.8% for NSCLC subtype classification from whole-slide images.
The algorithms used across these studies span the full range of machine learning approaches: CNNs for image-based classification, random forest and support-vector machine models for handcrafted radiomic features, K-means clustering for unsupervised pattern discovery, and elastic net regression for high-dimensional feature selection. The diversity of approaches reflects the fact that no single architecture dominates across all tasks, and hybrid pipelines combining radiomics with deep learning often outperform either alone.
Digital pathology: Despite early barriers from the high costs of digitalization, AI-powered computational pathology has gained significant momentum. AI now enables automatic lung cancer diagnosis across multiple specimen types, including H&E-stained slides, cryosection tissue slides, cytopathology samples, and lymph node biopsies. The Lunit SCOPE IO analytical tool demonstrated strong consistency between AI and pathologists for tumor-infiltrating lymphocyte (TIL) assessment, with correlation coefficients of R = 0.9429 to 0.9458. From H&E slides alone, AI can predict driver mutations, PD-L1 expression (AUC 0.63), and TIL density, reducing the need for separate immunohistochemistry or molecular panels.
Genomic sequencing enhancements: Advances in predictive biomarker discovery have paved the way for targeted therapies and immunotherapies. AI enhances somatic mutation identification in next-generation sequencing, outperforming standard genetic analysis approaches. By decoding genomic and transcriptomic data, AI can accurately determine the cell-of-origin for cancers of unknown primary, aiding in both diagnosis and treatment planning. In the context of immune biomarkers, AI can predict tumor mutation burden, neoantigens, and T-cell receptor-antigen binding specificity, all of which are relevant for immunotherapy patient selection.
Prognosis and staging: Lung cancer staging uses the tumor-nodal-metastasis (TNM) classification, sometimes requiring invasive procedures such as endobronchial ultrasound biopsy. AI integrates multi-modal data (medical records, radiology, pathology, and molecular data) to enhance staging accuracy and risk stratification. Pilot studies using routine radiology scans have predicted adenocarcinoma invasiveness, distant metastasis, and novel imaging subtypes. For radiology-based prognosis, AI predictions were significantly associated with overall survival at AUC 0.70 to 0.71, outperforming clinical feature predictions at AUC 0.58 to 0.66. For pathology-based prognosis, AI achieved AUC 0.64 to 0.85, compared to clinical feature predictions at AUC 0.52 to 0.84.
Surgical planning: For peripheral, node-negative NSCLC measuring 2 cm or smaller, sublobar resection is not inferior to lobectomy. However, lymph node-negative status can only be definitively confirmed after surgery. A deep learning model was developed to predict lymph node metastasis preoperatively (AUC 0.82), helping surgeons identify candidates suitable for sublobar resection before entering the operating room. Pulmonary function tests (PFTs), crucial for assessing surgical candidacy, can also be interpreted through AI collaboration with pulmonologists, with one study showing accuracy improvements of 10.4%.
Virtual reality and intraoperative AI: For patients undergoing segmentectomy, the variability and complexity of intrathoracic anatomy present significant challenges. Virtual reality systems have been developed to reconstruct thoracic anatomy, aiding in preoperative surgery planning and potentially reducing the duration of complex surgeries. During surgery, AI can detect air-leak sites by analyzing surgical videos, even in deflated lungs, with sensitivity and specificity of 81.3% and 68.9% respectively. This capability helps surgeons address potential complications before closing thoracic cavities.
Radiotherapy: Radiotherapy remains a critical therapeutic approach for locally advanced lung cancer, where it still holds curative potential. Accurate delineation of gross tumor volume and consistent contouring of organs at risk are essential but challenging. AI-based algorithms have been tested for auto-contouring and radiotherapy planning, which is especially useful for low- and middle-income countries with limited specialist availability. Radiomic models have been used to predict lung cancer recurrence, cardiotoxicity (sensitivity and specificity of 86% and 77.8% for 3-month overall survival), and lung toxicity after radiotherapy.
FDA-approved devices: The review catalogs 12 FDA-approved AI devices for lung cancer, all imaging-based. These include Riverain ClearRead CT, Siemens AI-Rad Companion, Coreline AView LCS, and Manteia MOZI TPS for radiotherapy planning. Most are Class II (moderate risk) devices approved between 2016 and 2023. Their tasks range from solid pulmonary nodule detection and Lung-RADS categorization to automated reporting and segmentation of lung, liver, and lymph node lesions.
Immunotherapy biomarkers: NSCLC was historically considered poorly immunogenic, but advances have identified two key immune checkpoints: CTLA-4 and the PD-1/PD-L1 axis. Anti-PD-1 and anti-PD-L1 antibodies significantly improve survival compared to chemotherapy. PD-L1 expression is the primary biomarker for predicting response to immune checkpoint inhibitors, but responses also occur in patients without detectable PD-L1. This paradox is attributed to intratumoral and intertumoral heterogeneity of PD-L1 expression, which introduces inherent biopsy sampling bias. AI can address this by predicting PD-L1 expression non-invasively from imaging data across the entire tumor volume.
Radiomic and deep learning biomarkers: Beyond traditional immune markers, radiomic biomarkers have provided early survival indicators in immunotherapy patients. Deep learning models effectively capture imaging patterns beyond known handcrafted features, enhancing predictive accuracy with AUC 0.9 for text report-based models. AI can also predict adverse reactions to immunotherapy, including hyperprogression, cachexia, and immunotherapy-induced pneumonitis. Blood biomarkers such as ctDNA and cytokines further feed AI models for predicting immunotherapy responses, and multimodal integration of radiomics, pathomics, and genomics holds promise for identifying the best immunotherapy candidates.
EGFR-targeted therapy: EGFR mutations are the most commonly targetable driver mutations in lung adenocarcinoma. Third-generation EGFR-TKIs have significantly extended survival, but treatment resistance remains challenging. Combination strategies with chemotherapy or VEGF inhibitors improve response durability but increase severe adverse events. Two AI studies demonstrated the ability to predict progression risk to identify high-risk patients who would most benefit from combination therapy, personalizing treatment intensity based on predicted resistance patterns.
Clinical decision support: Watson for Oncology (WFO) was explored for lung cancer decision-making. Preliminary results indicated AI's potential in adhering to clinical guidelines, but a relatively high proportion of cases were still not supported by WFO, and the system needed to learn regional patient characteristics. More broadly, AI-powered clinical decision support systems that integrate radiology, pathology, genomics, and clinical data can provide physicians with personalized treatment information, though full clinical integration remains in early stages.
Response monitoring: Treatment response in lung cancer primarily relies on RECIST (Response Evaluation Criteria in Solid Tumors), which measures lesion size changes. However, RECIST has been questioned in the context of targeted therapies and immunotherapies due to phenomena such as pseudoprogression. Noninvasive radiomic biomarkers can distinguish pseudoprogression from hyperprogression with AUC of 0.88, and hyperprogression from true progression with AUC of 0.87. Response assessment is also time-intensive and subject to high intra- and inter-reader variability, and deep learning has shown promise in automating RECIST evaluations for immunotherapy patients.
Minimal residual disease (MRD): Circulating tumor DNA (ctDNA) monitoring in plasma has emerged as a valuable method for detecting MRD and predicting patient survival. Longitudinal ctDNA detection offers insights into treatment response and can guide therapeutic strategies for patients with metastatic NSCLC. Machine learning approaches have shown promise in analyzing ctDNA kinetics, enabling the optimization of personalized therapies. One study achieved AUROC of 0.682 for identifying patients with increased risk of relapse using combined CT and ctDNA data.
Large language models: LLMs can respond to free-text queries without requiring specific task training, enabling rapid comprehension of medical domain knowledge. Medical chatbots have demonstrated the capability to generate responses comparable to clinicians in both quality and empathy. For lung cancer, LLMs may serve as decision aids and clinical trial matching tools. However, inaccuracy remains the most concerning problem, as LLMs can fabricate facts by learning statistical word associations rather than achieving true understanding. Their training data often comes from unverified internet sources. They function best as assistive tools under human supervision rather than in autonomous roles.
Clinical trial matching: AI facilitates matching patient medical records against enrollment criteria. Multiple studies have reported that AI can effectively extract patient data and match it to relevant clinical trials, potentially accelerating enrollment for lung cancer patients who might otherwise miss trial opportunities. This is particularly important given the complexity of modern lung cancer trials, which often require specific molecular profiles and prior treatment histories.
Data sharing: Continuous data supply is crucial for training, validating, and refining AI algorithms, but sharing data across institutions is hampered by patient privacy concerns and intellectual property protections. The paper outlines three solutions. Centralized learning pools data under a shared legal agreement, which is effective but costly. De-identified public datasets (the review catalogs 8 publicly available lung cancer datasets, including NLST, LIDC-IDRI, TCGA-LUAD, NSCLC-Radiomics, NSCLC Radiogenomics, AutoPET, RIDER Lung CT, and MIDRC) are more affordable but often lack specific patient information. Federated learning keeps data private at each institution while training models in a distributed manner, and it has been implemented across breast, brain, gastric, melanoma, and lung cancer applications.
Bias and fairness: AI models inherit biases from training data that favor particular racial, ethnic, or gender groups. Only 50% of Black women and 63% of Black men diagnosed with lung cancer qualified for screening under current criteria. Among 75,774 patients in The Society of Thoracic Surgeons database, white patients and those with private insurance had higher incidence of complex operations, suggesting systematic access disparities that would be replicated by AI trained on this data. Efforts are underway to generate more diverse datasets for both breast and lung cancer and to design algorithms that explicitly ensure fairness across demographic groups.
Interpretability: Deep learning approaches operate as end-to-end "black box" systems, mapping inputs directly to outcomes without manually selected features. This makes it difficult to understand which factors drive decisions, potentially leading to misleading conclusions from spurious confounders. Such opacity is often deemed unacceptable in healthcare. Explainable AI (XAI) is an active research area, but even FDA-approved AI devices currently offer limited interpretability. The optimal form of clinical explainability remains unknown.
Reproducibility: Most published AI studies lack reproducibility. Imaging protocols (CT scanner manufacturer, radiation dose, convolution kernel, iterative reconstruction, section thickness) significantly impact diagnostic performance. Motion artifacts and image noise further degrade quality. Annotation variability among radiologists introduces subjectivity. The Image Biomarker Standardization Initiative (IBSI) has made progress, and a 16-criteria checklist guides radiomic test development. Multiple reporting guidelines now exist: MINIMAR, SPIRIT-AI, CONSORT-AI, and ESMO-GROW.
Generalist AI: Currently, most AI models in healthcare are uni-modal and uni-task, requiring separate models for different data types (medical records, radiology, pathology, genomics) to solve even a single clinical question. The ultimate objective is generalist AI capable of analyzing multi-modal data and addressing a wide range of tasks. Novel deep learning architectures can integrate multiple modalities to improve performance. PathChat, a chatbot enabling interactive discussions with pathologists, represents an early step, providing expert-level insights on specific cases. Extending this concept, generalist models could integrate comprehensive patient information and interact with physicians the way ChatGPT functions, allowing physicians to define prediction tasks in natural language and receive explanations alongside predictions.
Wearable and environmental data: Beyond traditional medical data like radiologic images and genomic information, which are costly and not time-sensitive, smartphones and wearable sensors can collect extensive physiological and environmental data. AI could manage these large datasets to identify individuals at high risk for lung cancer, which is heavily influenced by environmental and behavioral factors. Real-time AI-assisted lung cancer prevention could offer personalized early intervention and risk management strategies while simultaneously accumulating data for researchers to identify underlying risk factors.
Remote monitoring: Integrating personal data from wearables can facilitate remote patient monitoring, providing alerts to primary physicians and patients during the diagnosis and treatment course of lung cancer. This is particularly relevant for treatment response tracking, where continuous data streams could complement periodic imaging assessments. The authors emphasize that multi-party collaboration is needed to optimize regulatory frameworks, improve AI development and validation standards, and strengthen full lifecycle management and post-market surveillance.
Regulatory evolution: The pace of AI development challenges existing regulatory frameworks and requires expanded staffing to process submissions efficiently. The approval process involves more stringent clinical trials and validation testing than what typically appears in academic publications. Most approved AI products perform well on predefined tasks like detection but lack generalizability across different patient populations. Multi-party collaboration between regulators, developers, clinicians, and institutions is needed to address these challenges while keeping pace with rapid technological advancement.