Integration of artificial intelligence in lung cancer: Rise of the machine

PMC 2023 AI 9 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why AI Matters for Lung Cancer, and What This Review Covers

Lung cancer remains the leading cause of cancer-related mortality worldwide, and its management generates enormous volumes of data at every stage, from screening and diagnosis through treatment and follow-up. This data includes clinical presentation, tumor stage, pathology, radiologic features, tumor genomics, liquid biopsies, treatment options, response assessment, and overall outcomes. The sheer complexity of interpreting all of this information is precisely why artificial intelligence (AI) has become both necessary and exciting in the lung cancer space. Several FDA-approved AI applications for clinical oncology, including lung cancer, already exist.

This review from City of Hope National Medical Center summarizes the current literature on AI applications across the full spectrum of lung cancer management. The authors present a clinical AI workflow schema (Figure 1) that outlines how AI operates in the lung cancer clinic: data sources feed into data preparation (cleaning, harmonization, feature selection), then model preparation, and finally implementation. The review spans machine learning (ML), neural networks (NNs), deep learning (DL), computer vision (CV), and natural language processing (NLP) as the core AI methodologies relevant to lung cancer.

Critical caveat: The authors emphasize that external validation and clinical implementation of lung cancer AI research is limited. Most studies included in this review rely on internal validation only, meaning reported performance metrics are likely overly optimistic and may not generalize to other datasets. The review therefore primarily illustrates how small-scale research has been conducted, with the goal of providing a framework for future expanded research, further validation, and actual clinical integration.

Data sources: The electronic medical record (EMR) is a primary data source for AI in lung cancer. Wang et al. developed a model using EMR data from the Main Health Information Exchange network, training on 873,598 patients and validating on a prospective cohort of 836,659 patients. Using an extreme gradient boosting (XGBoost) algorithm, the model predicted 1-year lung cancer risk with an AUC of 0.88. Separately, Kehl et al. trained an NLP model on over 300,000 imaging reports from 16,780 patients to predict cancer progression, achieving a concordance index of 0.76 and an AUC of 0.77.

TL;DR: This City of Hope review covers AI across the entire lung cancer management pathway. Key EMR-based models include an XGBoost model (AUC 0.88, trained on 873,598 patients) for lung cancer risk prediction and an NLP model (AUC 0.77, 300,000+ imaging reports) for progression prediction. Most studies lack external validation, so reported metrics are likely optimistic.
Pages 2-3
AI for Lung Cancer Screening: Classifying Nodules on Low-Dose CT

Lung cancer screening with low-dose computed tomography (LDCT) has been shown to reduce overall mortality by 20% in current and high-risk former smokers, as demonstrated by the National Lung Screening Trial (NLST). The U.S. Preventive Task Force currently supports annual LDCT screening for adults aged 50 to 80 with a 20 pack-year smoking history who currently smoke or have quit within the past 15 years. However, a major limitation of screening is low specificity: in the NLST, over 90% of identified nodules were not malignant. There are no current guidelines to help radiologists classify small indeterminate nodules as benign or malignant, leading to serial CT scans and biopsies that cause patient anxiety and carry non-trivial morbidity.

Computer-aided detection and diagnosis: The review describes three primary AI approaches for imaging: computer-aided detection and diagnosis (CAD) systems, convolutional neural networks (CNNs), and radiomics. CAD systems are classified into CADe (detecting the presence and location of lesions) and CADx (characterizing lesions, including malignancy identification). Several CAD systems have received FDA approval and can identify nodules as small as 3 mm while distinguishing them from normal pulmonary vascular anatomy. However, most CAD systems do not actually classify nodules as benign or malignant but rather draw attention to potential regions of interest.

CNN performance: CNNs have achieved impressive results in classifying nodules on screening LDCTs. Paul et al. used CNNs on select NLST participants, achieving 89.45% accuracy and an AUC of 0.96. Ardila et al. also analyzed NLST LDCT scans, achieving an AUC of 0.94. Critically, when prior CT imaging was not available, their model outperformed all six radiologists in the comparison, with absolute reductions of 11% in false positives and 5% in false negatives. When prior imaging was available, performance was comparable to radiologists.

Histologic differentiation: AI is not limited to benign-versus-malignant classification. Chen et al. used radiomics to differentiate between non-small-cell lung cancer (NSCLC) and peripherally located small-cell lung cancer (SCLC) with an AUC of 0.93, supporting radiomics as a non-invasive approach for early diagnosis and histologic subtyping. The Lung CT Screening Reporting and Data System (Lung-RADS) provides recommendations for workup, but AI may improve upon these guidelines by better identifying suspicious nodules worthy of intervention.

TL;DR: Over 90% of nodules found during NLST screening were benign. CNNs classified LDCT nodules with AUCs of 0.94 to 0.96, and one model outperformed six radiologists (11% fewer false positives, 5% fewer false negatives). Radiomics distinguished NSCLC from SCLC with an AUC of 0.93. Several CAD systems have FDA approval for nodule detection down to 3 mm.
Pages 3-4
Radiomic Models for Predicting Recurrence, Survival, and Treatment Response

Beyond screening, AI is being used to predict oncologic outcomes including locoregional and distant recurrence, progression-free survival, and overall survival (OS). Current primary tumor staging relies on the American Joint Committee on Cancer (AJCC) system, which largely uses tumor size. While tumor size generally correlates with survival, there is subjectivity, and additional radiomic information can refine prognostication. The Computer Aided Nodule Assessment and Risk Yield (CANARY) tool incorporates radiomic features with imaging findings to identify a subset of lung adenocarcinoma patients from the NLST who have more aggressive disease.

Combining radiomics with clinical and genomic data: D'Antonoli et al. extracted radiomic features from patients with resected NSCLC to predict recurrence. Using TNM stage alone, the model achieved an AUC of 0.58; using radiomics alone, 0.73; and when combined, the AUC improved to 0.75. Lee et al. used radiomic features to predict OS in stage I NSCLC, where combining radiomic and genomic features yielded a concordance index of 0.70, compared with 0.62 using molecular features alone. Additional studies have applied MRI radiomics to predict survival and mutational status in patients with brain metastases.

Longitudinal imaging and treatment monitoring: Buizza et al. demonstrated that PET/CT radiomic features could distinguish early responders from non-responders in patients undergoing definitive chemoradiation. Using a support vector machine (SVM) model, they achieved AUCs as high as 0.98 for sequential and 0.93 for concurrent chemoradiotherapy patients. Xu et al. used CNNs and other DL algorithms on CT imaging obtained before treatment and over the follow-up course to predict mortality risk in locally advanced NSCLC, achieving an AUC as high as 0.74, with performance improving as additional time-point scans were included.

Immunotherapy candidate selection: PD-L1 expression from a biopsy may not represent the heterogeneous tumor microenvironment. AI-based radiomic signatures are being explored as surrogates for responsiveness to PD-L1-directed therapies, with early evidence suggesting certain radiomic features may predict who will respond to immunotherapy. This non-invasive approach could complement tissue-based biomarker analysis.

TL;DR: Combining radiomics with TNM staging improved recurrence prediction AUC from 0.58 (stage alone) to 0.75. Radiomic plus genomic features yielded a concordance index of 0.70 for OS in stage I NSCLC. SVM models on PET/CT data achieved AUCs of 0.93 to 0.98 for early response prediction during chemoradiation. Radiomic signatures are being explored as non-invasive surrogates for PD-L1 expression.
Pages 4-5
Distinguishing Treatment Changes from Progression, and AI in Lung Cancer Pathology

Response assessment: Between 10% and 20% of patients with early-stage NSCLC and approximately 66% of patients with advanced NSCLC experience disease progression or die within five years. Accurate assessment of treatment response is therefore critical. Traditional evaluation relies on RECIST (Response Evaluation Criteria in Solid Tumors), WHO criteria, and increasingly immune-related RECIST (irRECIST) in the immunotherapy era. However, these criteria often do not correlate well with actual treatment response, particularly in patients treated with immunotherapy and radiation, where inflammation, pneumonitis, and post-treatment changes can mimic local progression.

Mattonen et al. used CT texture changes following stereotactic body radiation therapy (SBRT) to distinguish recurrence from radiation-induced lung injury (RILI). In their dataset of 13 lesions with moderate to severe RILI and 11 with recurrence, RECIST achieved only 65.2% accuracy with a 45.5% false-negative rate and a 27.3% false-positive rate. Using radiomics, accuracy improved to 77% with an AUC as high as 0.81. This capability to differentiate treatment-related changes from true progression could prevent unnecessary biopsies, premature therapy changes, and disqualification from clinical trials.

Histologic classification: CNNs are the most frequently applied model for tumor characterization in pathology, given their strength in image analysis. Coudray et al. used CNNs to classify histology samples as non-malignant, adenocarcinoma, or squamous cell carcinoma, achieving an AUC of 0.97. The same study attempted gene mutation prediction, with AUCs ranging from 0.73 to 0.86. For immunohistochemistry subtyping with limited tissue, Koh et al. used decision tree and SVM classifiers, achieving accuracy from 72.2% to 91.7% depending on the marker pattern, using just a three-marker panel on small NSCLC biopsies.

Tumor microenvironment: Wang et al. developed ConvPath, a CNN model that classified cell types in the tumor microenvironment with an overall accuracy of 90.1%. Understanding the tumor microenvironment may help explain tumor progression and metastasis, but manual classification by pathologists is prohibitively labor-intensive, making AI tools essential for expanding this research.

TL;DR: Radiomics improved post-SBRT recurrence vs. RILI classification from 65.2% accuracy (RECIST) to 77% (AUC 0.81). CNNs classified lung cancer histology with an AUC of 0.97 and predicted gene mutations at AUCs of 0.73 to 0.86. ConvPath classified tumor microenvironment cell types at 90.1% accuracy. IHC subtyping with SVM reached 72.2% to 91.7% accuracy on limited tissue.
Pages 5-6
AI-Driven Liquid Biopsy Analysis and Gene Expression Profiling

Liquid biopsies: Unlike tissue biopsies that assess spatial heterogeneity at a single location, liquid biopsies analyze tumor-derived products in the blood or serum and can capture temporal heterogeneity through serial sampling. In early-stage disease, plasma next-generation sequencing (NGS) with large panels can provide high sensitivity and specificity, potentially distinguishing benign from malignant nodules and guiding neoadjuvant and adjuvant therapy decisions.

Zhang et al. used synthetic minority oversampling technique (SMOTE), which addresses class imbalance by generating synthetic examples of the minority class, combined with random forests to identify lung cancer using circulating microRNA (miRNA). The model achieved an AUC as high as 0.99. However, this study used a case-control design with samples not limited to early-stage disease, which likely inflated performance. Despite this limitation, AI and ML will play a key role in liquid biopsy analysis because human interpretation of such multi-modal, high-dimensional data is simply not feasible. Improving the sensitivity and specificity of liquid biopsies through AI would benefit patients by offering a less invasive alternative that catches cancers earlier.

Genetic mutations and gene expression: Most current AI applications in lung cancer have focused on immunohistochemistry, but ML techniques are increasingly being applied to gene expression profile analysis. Adenocarcinomas that present with actionable mutations represent a key area where ML can help identify genomic pathways and potentially discover additional targetable biomarkers. With improvements in genomic sequencing through commercial NGS platforms, the expanding number of data points will enhance ML-driven treatment selection and understanding of tumor biology.

Cook et al. used a novel ML algorithm to sub-classify lung adenocarcinoma and lung squamous cell carcinoma. Using unsupervised learning, which searches for patterns without assigned labels, they identified novel mutations such as PIGX (a known oncogenic driver in breast cancer) as worthy of future investigation in lung cancer. This demonstrates how AI can discover previously unrecognized molecular targets through unbiased pattern recognition in genomic data.

TL;DR: SMOTE plus random forests detected lung cancer from circulating miRNA with an AUC of 0.99 (though likely inflated by case-control design). Unsupervised ML identified novel mutations like PIGX in lung cancer subtypes. AI is essential for liquid biopsy analysis because the multi-modal, high-dimensional data exceeds human interpretive capacity.
Pages 5-7
AI in Drug Discovery, Immunotherapy Prediction, and Radiation Treatment Planning

Clinical decision support: Watson for Oncology (WFO) was compared against a multidisciplinary team for treatment recommendations. Concordance was high for early-stage and metastatic disease (92.4% to 100%) but lower for stage II or III disease (80.8% to 84.6%). While there is room for improvement, such tools could standardize lung cancer treatment across institutions and disciplines.

Drug discovery and repurposing: AI offers two key avenues for drug development: screening previously developed drugs for new oncologic uses and identifying potential drug candidates before investing years of research. Li et al. used transcriptomic and chemical structure data as inputs into a DL algorithm for drug repurposing. The algorithm identified pimozide, an anti-dyskinesia agent previously used for Tourette's disorder, as a strong candidate for NSCLC treatment, with efficacy validated in in vitro experiments against certain NSCLC cell lines.

Immunotherapy and targeted therapy prediction: Charoentong et al. performed a pan-cancer immunogenomic ML analysis to predict response to checkpoint inhibitors. Their "immunophenoscore" outperformed PD-L1 expression for predicting immunotherapy response in certain histologies. For targeted therapy, Kureshi et al. applied SVM and decision tree classifiers to predict tumor response in EGFR-positive NSCLC patients receiving erlotinib or gefitinib, achieving a predictive accuracy of 76% and an AUC of 0.76.

Radiation oncology automation: Radiation treatment requires CT simulation, organ-at-risk (OAR) delineation, target volume definition, plan optimization, evaluation, and quality assurance. Wu et al. reported on AAR-RT, a DL system that automatically contours OARs on CT images for head-and-neck and thoracic malignancies. Zhang et al. developed an automatic planning algorithm for lung intensity-modulated radiation therapy (IMRT) that generated plans with equivalent or improved performance compared with experienced medical dosimetrists in target coverage, OAR sparing, and overall quality. Wall et al. used an SVM model for predicting quality assurance outcomes with a mean absolute error of 3.85%. Together, these steps could reduce the time to optimally plan a case from about a week to a day or less.

TL;DR: Watson for Oncology achieved 92.4% to 100% concordance with expert teams for early-stage and metastatic lung cancer. DL-based drug repurposing identified pimozide as a candidate NSCLC therapy. The "immunophenoscore" outperformed PD-L1 for checkpoint inhibitor response prediction. Automated radiation planning matched experienced dosimetrists, and QA prediction achieved 3.85% mean absolute error.
Pages 6-7
AI for Surgical Decision-Making, Robotic Surgery, and Bayesian Network Meta-Analyses

Surgical risk stratification: Lobectomy is the standard of care for localized lung cancer, with a mortality rate of 2.3% compared with 6.8% for pneumonectomies. However, not all patients are candidates, and AI can help risk-stratify patients for optimal treatment planning. Santos-Garcia et al. used a neural network to predict postoperative cardio-respiratory complications in NSCLC patients, achieving an AUC of 0.98. Esteva et al. used NNs to estimate postoperative prognosis and major complications from clinical and laboratory variables in 113 training patients, achieving 100% sensitivity and specificity in a test set of 28 patients.

Pulmonary function interpretation: Topalovic et al. compared a decision tree ML framework against 120 pulmonologists, each evaluating 50 pulmonary function test (PFT) cases, against guideline gold standards. Pulmonologists matched guidelines 74.4% of the time (interrater variability 0.67) and yielded the correct underlying diagnosis 45% of the time (interrater variability 0.35). In contrast, the ML model, trained on 1,500 historical cases, matched guidelines 100% of the time and achieved the correct diagnosis 82% of the time. This demonstrates AI's potential for more consistent interpretation of data critical to surgical candidacy assessment.

Robotic surgery and autonomous systems: Robotic-assisted surgical systems improve precision but are not inherently AI, as they serve as extensions of the surgeon. However, Shademan et al. reported on a "smart tissue autonomous robot" (STAR) that uses AI to optimize surgical planning. In anastomosis testing, STAR outperformed both manual laparoscopic surgery and standard robotic-assisted surgery in suture consistency, leakage, number of mistakes, completion time, and lumen reduction. This testing was performed in pigs, and no FDA approvals exist for autonomous surgery, but it demonstrates feasibility for the future.

Bayesian network meta-analyses: Traditional meta-analysis compares two interventions, but lung cancer management often involves more than two options. Bayesian network meta-analyses address this through indirect treatment comparisons via shared comparators, producing treatment rankings, odds ratios, and probability distributions. Applications in lung cancer include evaluating bevacizumab biosimilar safety, determining optimal platinum-based chemotherapy for early-stage resected NSCLC, assessing how smoking status influences targeted therapy response, and selecting first-line treatments based on PD-L1 expression.

TL;DR: NNs predicted postoperative cardio-respiratory risk with an AUC of 0.98. An ML model matched PFT guidelines 100% of the time vs. 74.4% for 120 pulmonologists, and achieved 82% correct diagnoses vs. 45%. The STAR autonomous robot outperformed manual and robotic surgery in preclinical testing. Bayesian network meta-analyses are expanding treatment comparison capabilities across multiple lung cancer interventions.
Pages 7-8
Key Barriers to Clinical Adoption of AI in Lung Cancer

Data acquisition and sample size: AI relies heavily on suitable quantities of training, testing, and validation data. Most outcomes-based studies include relatively small numbers of patients (tens to hundreds), which are somewhat heterogeneous in demographics, genomics, and imaging features. Sample sizes in the thousands may be required for many applications. Models built on small, homogeneous datasets may be inaccurate, poorly generalizable, and not reproducible in clinical practice. Furthermore, many EMR variables are recorded as free text, which cannot be directly extracted for analysis.

Reproducibility and validation: The studies reviewed almost exclusively lack external validation or prospective evaluation, which may explain their excellent and potentially overly optimistic performance metrics. Methodology and reproducibility vary across institutions and are somewhat at the discretion of the researcher. Reporting standards such as those published for radiomics research are an important step, but much of AI-based lung cancer research remains retrospective. Few studies have actually applied AI-based interventions to patient care and compared outcomes with the gold standard.

Data sharing and study design: Datasets are often stored in institutional repositories, which limits reproducibility and validation across centers. Centralized repositories are required for multimodality integration, which is essential for combining diverse data sources into the best possible models. Study designs must include appropriate model testing and validation through hold-out datasets, external datasets, or at minimum statistical resampling. Without these safeguards, models overfit training data and introduce bias.

Ethical and interpretability concerns: Questions remain about how to balance the risks and benefits of clinical AI implementation, including patient confidentiality and autonomy, consent for AI-guided care, and liability when models produce incorrect recommendations. Additionally, widely used DL models and radiomic features are not nearly as intuitive as standard Cox proportional hazards models. Frameworks in "explainable AI" (XAI), such as LIME (locally interpretable model-agnostic explanations) and SHAP (shapley additive explanation), provide estimates of how features drive predictions, but adoption of these tools is still early.

TL;DR: Most studies use tens to hundreds of patients and lack external validation, inflating performance metrics. Data silos at individual institutions limit reproducibility. Ethical questions around consent, liability, and patient autonomy remain unresolved. XAI frameworks like LIME and SHAP help address interpretability but are not yet widely adopted. Reporting standards for radiomics exist but are not universally followed.
Pages 8-9
Building Toward Validated, Clinically Integrated AI for Lung Cancer

The authors outline several concrete steps needed before AI can be broadly incorporated into the lung cancer clinic. The most important is improving sample sizes and data availability through centralized repositories. Initiatives like the Cancer Imaging Archive and The Cancer Genome Atlas provide large, centralized data sources that address multiple challenges simultaneously: sample sizes increase, training and testing datasets can be appropriately sized, reproducibility improves, and collaboration across institutions becomes feasible. Obstacles to this effort include data sharing difficulties, challenges in pooling diverse data types, and the significant effort required to curate and harmonize datasets.

Algorithmic and computational improvements: Computational power continues to increase, and as AI research in lung cancer grows, performance metrics and efficiency will improve, stimulating real-world clinical applicability. However, the authors caution against overstating AI's potential. AI is not a "silver bullet," and the challenges of translating research into the lung cancer clinic are substantial. It is also not, and should not be, an absolutely autonomous system. The goal is to develop systems informed by all stakeholders, including patients, physicians, and administrators, to ensure models are usable, applicable, and valid.

The path forward: Available treatment options for lung cancer, and the precision with which they can be selected, have improved dramatically. But these increasingly tailored options require data to inform decisions and the ability to make sense of large volumes of information. The overarching field of AI, including ML, NNs, DL, NLP, XAI, and other methodologies, offers a promising avenue for improving all aspects of lung cancer management with data-driven approaches. Radiomics allows additional value to be derived from existing diagnostic imaging. ML algorithms help optimize treatment selection. With large databases and suitable platforms, AI research will continue to grow and become more reproducible, accurate, and applicable.

The review concludes by noting rising interest in AI across the oncology community, including among young trainees. This growing engagement, combined with expanding computational resources and data infrastructure, positions AI-based interventions to play a key role in the future of lung cancer management. The critical next step is moving from promising retrospective studies to prospective clinical validation and actual workflow integration.

TL;DR: Centralized data repositories (Cancer Imaging Archive, The Cancer Genome Atlas) are essential for scaling AI research beyond small, single-institution studies. AI should not be treated as autonomous or infallible but as a stakeholder-informed decision support tool. The field must transition from retrospective hypothesis-generating studies to prospective clinical validation and real-world workflow integration.
Citation: Ladbury C, Amini A, Govindarajan A, et al.. Open Access, 2023. Available at: PMC9975283. DOI: 10.1016/j.xcrm.2023.100933. License: cc by-nc-nd.