Artificial Intelligence and Lung Cancer: Impact on Improving Patient Outcomes

PMC 2023 AI 9 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why AI Matters for Lung Cancer, and What This Review Covers

Lung cancer is the second most common cancer in both males and females and carries the highest mortality of any cancer worldwide, accounting for 21% of all cancer-related deaths. Despite decades of progress in treatment, early detection remains elusive: only about 20% of cases are diagnosed at stage I, a statistic that has barely changed over the years. Even with significant advancements in immunotherapy and targeted therapies, the response rate to these treatments remains highly variable and unpredictable, creating an urgent need for better diagnostic and prognostic tools.

This 2023 review by Gandhi et al. provides a comprehensive overview of how artificial intelligence (AI) is being applied across the full lung cancer pathway, from screening and diagnosis through staging and treatment. The authors examined publications from 2012 to 2023 in PubMed, using MeSH terms covering artificial intelligence, machine learning, radiomics, deep learning, lung cancer screening, nodule detection, lung cancer diagnosis, staging, treatment, and treatment response. From an initial pool of 270 articles, they retained 69 papers after excluding duplicates, irrelevant abstracts, and studies without available full text.

AI taxonomy: The review distinguishes between the main branches of AI relevant to lung cancer. Machine learning (ML) encompasses supervised, unsupervised, reinforcement, and active learning paradigms, with specific models including Bayesian inference, decision trees, support vector machines (SVMs), logistic regression, and artificial neural networks (ANNs). Deep learning (DL) is a subset of ML that uses multiple neural network layers simultaneously for both feature selection and model fitting. Radiomics refers to the extraction and quantitative analysis of imaging features from medical scans. All three branches have been applied to lung cancer imaging, pathology, biomarker analysis, and treatment planning.

Clinical context: Multiple FDA-approved AI applications already exist in clinical oncology, including lung cancer. The heterogeneity of lung cancer, with its many subtypes, genetic mutations, and variable treatment responses, makes it a particularly rich domain for AI. The review emphasizes that AI should function as a supplementary tool to enhance physician decision-making rather than replace it, with the goal of improving early diagnosis, treatment selection, and ultimately patient outcomes.

TL;DR: Lung cancer causes 21% of all cancer deaths, and only 20% of cases are caught at stage I. This review analyzed 69 papers (2012-2023) from PubMed covering AI in lung cancer screening, diagnosis, staging, and treatment. Multiple FDA-approved AI tools already exist, leveraging ML, DL, and radiomics across the full clinical pathway.
Pages 2-3
Search Strategy and Review Framework

The authors conducted a narrative review rather than a systematic review or meta-analysis, searching the PubMed database for publications spanning 2012 to 2023. Their search strategy used Medical Subject Heading (MeSH) terms including "artificial intelligence," "machine learning," "radiomics," "deep learning," "lung cancer," "lung cancer screening," "lung nodule detection," "lung nodule characterization," "lung nodule segmentation," "lung cancer diagnosis," "lung cancer staging," "lung cancer treatment," and "treatment response." This broad set of terms was designed to capture AI applications across the entire lung cancer clinical pathway.

From an initial yield of 270 articles, the screening process removed duplicate records and studies whose titles or abstracts were irrelevant to the research objective. Full-text papers that were unavailable were also excluded. The final corpus comprised 69 papers, including prospective studies, retrospective analyses, and review articles, all centered on the use and implementation of artificial intelligence in lung cancer.

Review team: Five investigators, each with distinct clinical backgrounds and expertise, conducted the literature review independently. This multi-investigator approach was designed to bring diverse perspectives and interpretations to the synthesis, reducing individual bias. The review was structured around major clinical application areas: screening (subdivided into imaging and non-imaging techniques), diagnosis (diagnostic imaging, histopathology, and biomarkers), staging, treatment, and prediction of treatment outcomes.

The authors provided a visual framework (Figures 1 and 2) outlining the implementation of AI across the lung cancer pathway and the search algorithm used for article selection. While the narrative format does not include formal risk-of-bias assessment tools like QUADAS-2 or PROBAST, the breadth of the included study designs, ranging from randomized controlled trials to retrospective cohorts, provides a comprehensive overview of AI's current state in lung cancer management.

TL;DR: PubMed was searched using 13 MeSH terms covering AI and lung cancer (2012-2023). From 270 initial articles, 69 were retained after excluding duplicates, irrelevant studies, and unavailable full texts. Five investigators independently reviewed the literature across screening, diagnosis, staging, and treatment domains.
Pages 3-5
AI-Powered Lung Cancer Screening: From Risk Prediction to Nodule Detection

The National Lung Screening Trial (NLST) established that early screening of high-risk populations reduces lung cancer mortality by 20%. Current USPSTF guidelines (updated March 2021) recommend annual low-dose CT (LDCT) for individuals aged 50-80 with a 20-pack-year smoking history who currently smoke or quit within the past 15 years. AI enhances this screening process by minimizing radiation exposure, accurately detecting and categorizing lung nodules, personalizing screening schedules, and providing LDCT interpretation where skilled radiologists are in short supply.

Risk prediction models: Convolutional neural networks (CNNs) using non-imaging data from electronic medical records (EMRs) have identified high-risk patients and predicted 1-year lung cancer rates with an AUC of 0.90. CXR-LC, a model relying solely on chest X-ray findings and limited clinical data, achieved an AUC of 0.755, comparable to the established PLCO model (AUC 0.751). Sybil, a validated deep learning model, can predict 6-year lung cancer risk from a single LDCT scan. LUMAS, another CNN, predicted 1-year lung cancer risk using previous and recent CT scans with an AUC of 0.94, outperforming radiologists in head-to-head comparisons.

Nodule detection performance: A multi-view convolutional network (ConvNet) CAD system achieved detection sensitivity of 84.1% at one false positive per scan and 90.1% at four false positives per scan. A randomized controlled trial of 10,476 participants showed the AI-assisted group detected actionable nodules (Lung-RADS category 4) at a rate of 0.52% versus 0.25% in the non-AI group, with malignant nodule detection at 0.15% versus 0.0%. Deep learning techniques enhanced digital chest tomosynthesis (DTS) for nodules 5-8 mm in size, achieving sensitivity of 0.90 and positive predictive value of 0.95.

Top-performing systems: The AI-RAD Companion, a CNN prototype, detected pulmonary nodules on LDCT with perfect sensitivity (1.0) and specificity of 0.708, contributing to a lung cancer prediction AUC of 0.942. The DL-CADe system achieved a higher per-examination nodule detection rate (86.2%) than double reading by two radiologists (79.2%). Detection performance across dose levels showed AUCs of 0.989 for standard-dose CT, 0.983 for low-dose, and 0.970 for very low-dose CT. A deep learning automatic detection algorithm (DLAD) using chest X-ray data achieved radiograph classification AUROC of 0.92-0.99 and JAFROC FOM of 0.831-0.924.

TL;DR: AI screening models achieve AUCs of 0.90-0.94 for lung cancer risk prediction. In a 10,476-participant RCT, the AI group detected twice as many actionable nodules (0.52% vs. 0.25%) and caught malignant nodules missed entirely by the non-AI group. The AI-RAD Companion reached perfect sensitivity (1.0) with AUC 0.942 for cancer prediction. DL-CADe outperformed double-reading by two radiologists (86.2% vs. 79.2%).
Pages 5-6
Classifying Nodules and Non-Imaging Screening Tools

Nodule segmentation and characterization: Beyond detecting nodules, AI must determine whether they are benign or malignant. Multi-scale CNN models trained on raw nodule patches, without predefined morphological features, captured nodule heterogeneity and achieved 88.84% classification accuracy against noisy backgrounds. AI-based radiomics models like SVM-LASSO outperformed the established Lung-RADS system for malignant nodule detection, using two extracted features (bounding box anterior-posterior dimension and the standard deviation of inverse difference moment) to achieve 84.6% accuracy (AUC 0.89), compared to Lung-RADS at 72.2% accuracy (AUC 0.77).

Segmentation advances: Automated segmentation eliminates the tedious manual process of outlining nodule boundaries from surrounding thoracic tissue, enabling accurate volume and density measurements critical for malignancy determination. SD-Unet, a deep learning biomedical segmentation model, improved segmentation accuracy by classifying image voxels. Soliman et al. developed a spatially non-uniform joint 3D Markov-Gibbs random field (MGRF) method that integrated visual appearance submodels with an adjustable lung shape submodel, achieving a DICE similarity coefficient of 98.4% (+/- 1.0%) and 99.0% (+/- 0.5%) on an external validation database.

Non-imaging biomarker screening: Emerging biomarkers for lung cancer include autoantibodies, complement fragments, miRNA, tumor DNA, and serum proteins, though their individual sensitivity and specificity remain limited. Artificial neural networks (ANNs) combined with serum protein panels (including beta-2-microglobulin, CEA, gastrin, CA125, NSE, sIL-6R, and metal ions Cu2+/Zn2+, Ca2+, and Mg2+) achieved a prediction rate of 85%, which rose to 87.3% when clinical parameters like symptoms, risk factors, smoking status, and kitchen environment were added.

The Pulmonary Nodules Artificial Intelligence Diagnostic System (PNAIDS), which analyzes CT images combined with tumor markers, achieved the highest specificity at 96.1%. Integration with circulating abnormal cells yielded a specificity of 94.1%. These findings demonstrate that multi-modal approaches, combining AI-driven CT analysis with biomarker panels, substantially outperform any single screening modality and offer promising paths toward earlier, more accurate lung cancer detection.

TL;DR: SVM-LASSO radiomics beat Lung-RADS for malignant nodule detection (AUC 0.89 vs. 0.77). Segmentation models achieved 98.4-99.0% DICE similarity. ANN-based biomarker panels predicted lung cancer with 85-87.3% accuracy. PNAIDS combined with tumor markers reached 96.1% specificity, showing that AI plus multi-modal data is the strongest screening combination.
Pages 6-8
AI in Lung Cancer Diagnosis: CT, PET-CT, and Subtype Differentiation

AI diagnostic imaging for lung cancer centers on CT and PET-CT analysis to identify and characterize tumors. Ardila et al. developed a deep learning algorithm for lung cancer detection via low-dose CT that achieved a striking AUC of 94.4%. A separate study examining 200 lung nodules on CT achieved an AUC of 0.72 using radiomics. Machine learning applied to FDG-PET imaging reached 95.9% sensitivity and 98.1% specificity at standard dose, and 91.5% sensitivity and 94.2% specificity at ultra-low dose (0.11 mSv), demonstrating that AI can maintain diagnostic accuracy even at very low radiation exposures.

Meta-analytic evidence: A meta-analysis by Liu of AI-aided diagnosis using CT images showed combined sensitivity of 87%, specificity of 87%, and SROC area of 93%. Another meta-analysis of nine NSCLC studies found pooled sensitivity and specificity of 78% and 71%, with a radiomics AUROC of 0.78 (95% CI 0.73-0.83). For ground glass nodules specifically, a combined radiographic-radiomics nomogram (AUC 0.77; 95% CI 0.69-0.86) outperformed a radiographic-only model (AUC 0.71; 95% CI 0.62-0.81) for invasiveness prediction. A Chinese retrospective study of 100 patients with sub-solid nodules built an integrated CT-radiomics model that differentiated minimally invasive from invasive adenocarcinoma with AUC 0.943 in training and 0.912 in validation.

Subtype classification: A CNN model analyzing 301 lung carcinoma CT images achieved 0.93 sensitivity and 0.87 F1 score for overall lung cancer detection, and further distinguished small cell carcinoma, adenocarcinoma, and squamous cell carcinoma with sensitivity, specificity, and F1 scores of 0.90, 0.44, and 0.59, respectively. Saad et al. achieved AUC 0.93 in differentiating NSCLC from peripherally located SCLC using radiomics. These non-invasive classification capabilities are clinically significant because determining lung cancer subtype traditionally requires invasive biopsy, and earlier subtype identification can accelerate treatment initiation.

TL;DR: Deep learning on low-dose CT achieved AUC 0.944 for lung cancer detection. FDG-PET with ML reached 95.9% sensitivity and 98.1% specificity at standard dose. Meta-analyses showed pooled AI diagnostic sensitivity/specificity of 87%/87% on CT. Integrated CT-radiomics models differentiated invasive from minimally invasive adenocarcinoma with AUC 0.912 on validation. Radiomics differentiated NSCLC from SCLC at AUC 0.93.
Pages 8-9
AI-Driven Pathology and Molecular Biomarker Detection

Histopathological diagnosis: Histological examination through bronchoscopy or percutaneous biopsy remains the gold standard for lung cancer diagnosis, but manual reading is difficult given the many subtypes. Yu et al. analyzed 2,480 histopathological images and differentiated malignant tumors from healthy tissue with an AUC of 0.81 using SVM and random forest models. Teramoto et al. used deep CNNs on 298 images to classify adenocarcinoma, squamous cell carcinoma, and small cell lung cancer with accuracies of 89%, 60%, and 70%, respectively, exceeding the accuracy of both cytotechnologists and pathologists.

A prospective study combined clinical information (age, smoking history), radiological features (nodule diameter, count, upper lobe location, malignant signs, sub-solid status), LDCT AI analysis, and liquid biopsy data. The integrated prediction model achieved 89.53% sensitivity, 81.31% specificity, and an AUC of 0.880 in the training group. This combined approach could improve early diagnosis while sparing patients with benign nodules from unnecessary surgery. The authors emphasize that AI-mediated histopathological diagnosis will increase pathologists' productivity and significantly reduce misdiagnosis rates.

Biomarker prediction: The most clinically relevant lung cancer biomarkers include Rb, K-RAS, EGFR, c-MET, TP53, ALK, and PD-L1. Coudray et al. trained neural networks on pathological images to predict the ten most common mutant genes in adenocarcinoma, successfully predicting six of them (KRAS, STK11, TP53, EGFR, SETBP1, and FAT1) with accuracies of 73.3-85.6%. Zhong et al. measured five predictive antibody markers (paxillin, SEC15L2, BAC clone RP11-499F19, XRCC5, and MALAT1) in 23 stage 1 NSCLC patients and 23 matched controls, achieving AUC 0.99, 91.3% sensitivity, and 91.3% specificity using a logistic regression model.

A biomarker panel of Cyfra 21.1, CEA, CA125, and CRP tested in 63 lung cancer patients and 87 non-cancer patients correctly classified 135 of 150 subjects, with accurate classification rates of 88.9% (training), 93.3% (validation), and 90% (testing). A diagnostic model incorporating HE4, sVCAM-1, TTR, ApoA2, and CEA achieved an AUC of 0.988 with 93.33% sensitivity and 92.00% specificity for lung cancer detection. While no universal biomarker panel yet exists, these AI-optimized panels show strong potential, though each must be validated in the target population before clinical deployment.

TL;DR: Deep CNNs classified lung cancer subtypes from pathology images at 70-89% accuracy, surpassing human pathologists. Neural networks predicted six key gene mutations from images with 73.3-85.6% accuracy. A five-antibody panel reached AUC 0.99 in stage 1 NSCLC detection. Multi-biomarker panels with AI achieved classification rates of 88.9-93.3% and AUCs up to 0.988.
Pages 9-10
AI for Cancer Staging and Treatment Recommendations

Staging: Accurate staging via the TNM classification system is critical for treatment planning and prognosis, yet most lung cancers are detected at advanced stages. AI can accelerate staging by serving as a second reader for PET and CT scans, reducing radiologist workload and improving detection precision. CNNs using multiplanar reconstruction of PET/CT scans can predict anatomical locations of metastatic lesions. DFCNet, a fully convolutional model, achieved 84.58% overall accuracy for lung cancer stage detection and classification, outperforming the standard CNN model at 77.6%.

Treatment recommendations: AI assists across surgical, radiation, chemotherapy, and immunotherapy decisions. For radiotherapy, ML systems can optimize radiation beam angles, predict dose-volume histograms, monitor radiation levels and toxicity, and develop clinical decision support tools using high-quality data from CT scans and treatment histories. Luo et al. proposed an integrated learning collaborative filtering technique to simplify personalized medication selection. QUANIC, another platform, uses large-scale multimodal and longitudinal data to build personalized immunotherapy response models.

IBM Watson for Oncology (WFO): WFO extracts information from medical records, presents evidence, and explores treatment options. Studies in China and Korea showed high concordance between WFO recommendations and multidisciplinary team (MDT) decisions, with a concordance rate of 92.4% in one study and 65.8% consistency in overall treatment recommendations in another. However, applying WFO across different countries presents challenges due to variations in genetic mutation rates, treatment protocols, drug availability, and comorbidities. Regional adaptation remains essential for global applicability.

Drug discovery: A deep learning algorithm analyzing transcriptomic and chemical structure data identified pimozide as a candidate drug for NSCLC, which was subsequently validated in vitro. Neural networks also predicted postoperative outcomes in NSCLC patients with high accuracy for cardio-respiratory toxicity and complications, achieving an AUC of 0.98 in one study. These applications illustrate AI's expanding role not only in treatment selection but also in drug repurposing and surgical risk stratification.

TL;DR: DFCNet improved lung cancer stage classification accuracy from 77.6% (CNN) to 84.58%. WFO achieved 92.4% concordance with multidisciplinary teams for treatment recommendations. AI predicted postoperative cardio-respiratory toxicity with AUC 0.98. A DL-driven drug repurposing pipeline identified pimozide as a validated NSCLC drug candidate.
Pages 10-12
Predicting Treatment Response with Radiomics and Deep Learning

Predicting how individual patients will respond to specific therapies is one of AI's most impactful applications in lung cancer management. Dercle et al. used a CT-based radiomics model with a random forest algorithm to predict treatment sensitivity across three different therapies: nivolumab (AUC 0.77), docetaxel (AUC 0.67), and gefitinib (AUC 0.82). These results show that a single radiomic framework can assess response across immunotherapy, chemotherapy, and targeted therapy, though performance varies by drug class.

EGFR-targeted therapy: Deep learning models have successfully predicted EGFR mutation probability and patient response to EGFR-tyrosine kinase inhibitors (TKIs) and checkpoint inhibitors (CPIs). Kureshi et al. built a data-driven model incorporating clinical history, environmental risk factors, and EGFR mutation status that achieved 76% predictive accuracy for tumor response to EGFR-TKI therapy. Mu et al. predicted EGFR mutations using deep learning on imaging data with AUCs of 0.84 and 0.83 in separate cohorts, offering a non-invasive alternative to tissue-based mutation testing.

Immunotherapy prediction: Radiomics captures tumor heterogeneity and immune infiltration patterns from CT images, enabling the development of radiomic signatures associated with immunotherapy response. Tian et al. applied radiomics and deep learning to predict response to PD-1 and PD-L1 immunotherapy with an AUC of 0.71. Radiomics-based AI models have also demonstrated the ability to predict PD-L1 expression levels by combining radiomic features with clinical data, providing prognostic value for progression-free survival in immunotherapy candidates.

Beyond single-timepoint prediction, AI can incorporate serial imaging data to track tumor changes over time. Recurrent neural networks (RNNs) analyze longitudinal data from post-treatment CT scans to provide insights into phenotypic evolution and treatment response. ML applications have also predicted early death following curative-intent chemoradiation and treatment failure in early-stage NSCLC patients treated with stereotactic body radiation therapy (SBRT), enabling more informed patient counseling and care optimization.

TL;DR: A single radiomics model predicted sensitivity to nivolumab (AUC 0.77), docetaxel (AUC 0.67), and gefitinib (AUC 0.82). EGFR mutation prediction via DL achieved AUCs of 0.83-0.84. PD-1/PD-L1 immunotherapy response prediction reached AUC 0.71. Recurrent neural networks track longitudinal tumor changes from serial CT scans to monitor treatment response over time.
Pages 12-13
Key Barriers to Clinical Adoption and the Road Ahead

Data limitations: The most fundamental barrier is the lack of large, diverse clinical datasets for training AI models. Public datasets like LIDC-IDRI and LUNA16 contain images from limited centers and may not represent broader populations. They also lack comprehensive clinical information and are susceptible to data annotation errors. The authors argue that multi-institutional collaboration is needed to create standardized datasets encompassing diverse patient populations, various lung cancer stages, and longitudinal follow-up data. The FDA's 21st Century Cures Act provides a framework for using post-approval real-world data (RWD), but concerns about data heterogeneity, quality, and reproducibility persist.

Infrastructure and training gaps: Implementing AI in clinical practice requires substantial infrastructure investment and training for healthcare professionals. AI tools must be regularly updated to keep pace with evolving healthcare workflows. Programming interfaces need to be developed to allow seamless integration between AI algorithms and electronic health record (EHR) systems, enabling real-time data exchange for lung cancer management decisions. Regular feedback loops from oncologists, radiologists, and other specialists are essential for refining AI model performance over time.

Interpretability and transparency: Many AI predictions in lung cancer treatment are difficult for clinicians to interpret because the models lack transparency in explaining their decision rationale. This "black box" problem undermines clinician trust and creates barriers to adoption. The authors highlight that addressing explainability is critical before AI can be fully integrated into shared decision-making workflows. Without clear reasoning behind AI recommendations, clinicians cannot confidently act on or communicate them to patients.

Regulatory and ethical concerns: Patient privacy, data security, data ownership, and compliance with regulations like HIPAA add complexity to AI deployment. The authors call for a clear regulatory framework with guidelines for the acceptance and deployment of AI models in healthcare that ensures patient safety and ethical data handling standards. Looking forward, they see the integration of multi-modal data (imaging, clinical, and genomic) as the most promising direction, with AI models that contextualize imaging findings with clinical data to guide physicians toward better patient outcomes through truly personalized lung cancer management.

TL;DR: Key barriers include limited training datasets (LIDC-IDRI and LUNA16 lack diversity), infrastructure gaps, black-box interpretability problems, and HIPAA/regulatory complexity. The path forward requires multi-institutional data sharing, EHR integration, explainable AI models, and clear FDA/regulatory frameworks. Multi-modal integration of imaging, clinical, and genomic data is the most promising future direction.
Citation: Gandhi Z, Gurram P, Amgai B, et al.. Open Access, 2023. Available at: PMC10650618. DOI: 10.3390/cancers15215236. License: cc by.