Application of Radiomics and Artificial Intelligence for Lung Cancer Precision Medicine

PMC 2021 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Radiomics Turns Standard Medical Images into Quantitative Biomarkers for Lung Cancer

This 2021 review by Tunali, Gillies, and Schabath from H. Lee Moffitt Cancer Center examines how radiomics and artificial intelligence (AI) are being applied across the lung cancer care continuum. Radiomics is defined as the process of converting standard-of-care medical images, stored in DICOM format, into high-dimensional quantitative data that can be merged with other data sources such as pathology, hematology, genomics, and proteomics. The authors argue that because medical imaging is already integral to lung cancer detection, diagnosis, treatment planning, and monitoring, radiomics has a uniquely advantageous position to extract additional clinical value from images that are already being acquired.

Clinical context: In routine radiology practice, only a handful of quantitative metrics are used to describe pulmonary lesions: CT-based largest diameter following Fleischner Society or Lung-RADS guidelines, RECIST-based tumor diameter, SUV-derived metrics from PET, and percent enhancement on MRI. Radiomics expands this dramatically by extracting hundreds to thousands of features that capture biological and pathophysiological information invisible to the human eye. These features have shown utility as noninvasive biomarkers for lung cancer risk prediction, diagnostics, prognosis, treatment response monitoring, and tumor biology characterization.

Scope of the review: The paper covers five major application domains: early detection and screening, prognostication, treatment response prediction, radiogenomics (linking imaging features to genomic phenotypes), and limitations with recommendations. The authors emphasize both conventional radiomics (feature extraction from segmented regions of interest) and deep learning approaches that can analyze entire images without explicit segmentation. Throughout the review, they highlight specific AUC values, hazard ratios, and concordance indices from key studies, providing a data-rich survey of the field as of 2021.

TL;DR: This Moffitt Cancer Center review covers how radiomics converts standard lung cancer imaging (CT, PET, MRI) into quantitative biomarkers, spanning screening, prognosis, treatment response, and radiogenomics, with detailed performance metrics from dozens of studies.
Pages 2-5
The Five-Step Radiomics Pipeline: From Image Acquisition to Model Validation

The conventional radiomics pipeline consists of five fundamental steps. Step 1, image acquisition: Standard-of-care medical images are used, leveraging data already stored on every radiology PACS server. However, image-acquisition protocols vary widely within and across practices. Preprocessing techniques such as resampling all voxels to a specific size (Shafiq-Ul-Hassan et al. 2018) and applying relative discretization (fixed number of bins) or absolute discretization (fixed bin size) have been shown to substantially impact the reproducibility of PET-, CT-, and MRI-derived features. Normalization of features by total ROI voxels or single voxel volume helps eliminate dependence on volume and scanner variability.

Step 2, ROI selection and segmentation: Regions of interest may include index nodules, tumors, metastatic lesions, or whole organs in three-dimensional space. This gives radiomics an advantage over tissue-based biomarkers, which only capture a portion of a tumor from biopsy. Manual segmentation is laborious and shows poor interreader reproducibility, with Dice coefficients often below 80% (Alilou et al. 2017). Semiautomated approaches (e.g., single-click initialization followed by computer-derived delineation) and fully automated methods are faster and more repeatable (Kalpathy-Cramer et al. 2016b; Tunali et al. 2019c), though they may still require manual verification on difficult cases.

Step 3, feature extraction: Radiomic features are classified into first-order (shape, size, histogram-based metrics like mean, entropy, skewness, kurtosis), second-order "texture" features (statistical interrelationships between voxels, including GLCM-based measures), and higher-order features (fractal analyses, Minkowski functionals, wavelets, Laplacian transforms of Gaussian bandpass filters). The Image Biomarker Standardization Initiative (IBSI) has worked to standardize nomenclature, definitions, benchmark datasets, and benchmark values for reproducible feature extraction (Zwanenburg et al. 2018).

Steps 4 and 5, model building and validation: Analytical methods range from conventional biostatistics (Cox regression, logistic regression) to machine learning (random forests, LASSO, SVM, artificial neural networks). The choice depends on sample size, study endpoint (dichotomous vs. time-dependent), and metric of interest (P-value-driven vs. AUC). Models can integrate orthogonal data including clinical covariates, driver mutations, IHC proteomic data, and circulating biomarkers. Principal component analysis or clustering can reduce dimensionality to avoid overfitting. External validation on independent datasets with different acquisition protocols and patient populations is the gold standard for demonstrating generalizability.

TL;DR: The radiomics pipeline involves 5 steps: image acquisition (with voxel resampling and discretization for robustness), ROI segmentation (manual Dice often below 80%, favoring semiautomated methods), feature extraction (first/second/higher-order, standardized by IBSI), model training (LASSO, random forest, Cox regression, etc.), and independent external validation.
Pages 5-7
Deep Learning, Transfer Learning, and GANs: AI Approaches for Lung Cancer Imaging

The authors distinguish two modes of AI application in medical image analysis. In "conventional radiomics," features extracted from segmented ROIs are fed into machine learning algorithms (random forests, SVMs, logistic regression, LASSO) to build classifiers. In deep learning (DL) approaches, entire images or image series are input directly into convolutional neural networks (CNNs) that learn features automatically without requiring explicit ROI segmentation. DL methods need only a seed point or bounding box rather than a full manual delineation. Lung cancer is one of the most prominently researched cancer types with AI, owing to its medical importance, the abundance of CT and PET/CT images, and the high-contrast, high-resolution nature of CT imaging.

Data augmentation strategies: Large datasets are critical for DL success, as small datasets are prone to overfitting and poor generalizability. The authors describe several augmentation approaches. Transfer learning involves pretraining a neural network on a different task to learn general features (edges, textures) and then fine-tuning it for the lung cancer task. Public repositories like The Cancer Imaging Archive (TCIA) and the National Lung Screening Trial (NLST) provide annotated images for pretraining. Traditional augmentation techniques include warping, rotating, and inverting images. More advanced approaches use generative adversarial networks (GANs) to create synthetic images, for example converting contrast-enhanced to non-enhanced CT, or MRI to CT (Sandfort et al. 2019).

Distributed learning: When augmentation is insufficient, particularly for rare diseases, the authors suggest centralized databases or distributed learning platforms where models (code) are shared across institutions instead of patient data (Lambin et al. 2017). This federated approach addresses both data scarcity and privacy constraints simultaneously.

Key limitations of DL: The "black box" nature of deep learning remains a concern, though Chartrand et al. (2017) argues that a highly accurate opaque system is preferable to an inaccurate transparent one. Another limitation is that most DL methods are optimized for binary classification rather than time-dependent endpoints like survival outcomes, which are central to many oncology applications.

TL;DR: AI in lung cancer imaging uses either conventional radiomics (hand-crafted features plus ML classifiers) or deep learning (CNNs on raw images). Data augmentation through transfer learning, GANs, and distributed/federated learning addresses sample size constraints. Key DL limitations include the black-box problem and poor handling of time-dependent survival endpoints.
Pages 7-10
Radiomics for Lung Cancer Screening: Distinguishing Malignant from Benign Nodules

Lung cancer is the most commonly diagnosed cancer worldwide and the leading cause of cancer-related deaths. Localized lung cancer has a 56% 5-year overall survival (OS), compared to only 5% for distant metastatic disease (Siegel et al. 2019), making early detection critical. The NLST demonstrated a 20% relative reduction in lung cancer mortality with LDCT screening compared to chest radiography in 53,454 current and former smokers aged 55 to 74. Additional trials, including the Dutch-Belgian NELSON trial, the Italian MILD trial, and the German LUSI trial, have all validated the mortality benefit of LDCT screening. However, screening produces many indeterminate pulmonary nodules and can lead to overdiagnosis of indolent neoplasms.

NLST-based radiomic studies: Hawkins et al. (2016) used baseline LDCT scans to predict which indeterminate nodules (4-12 mm) would become incident lung cancers, achieving an AUC of 0.81 with a 23-feature ML model, far outperforming volume alone (AUC = 0.72). Peikert et al. (2018) used LASSO to distinguish malignant from benign screen-detected nodules with just 8 non-redundant radiomic features, reaching an AUC of 0.939. Huang et al. (2018) performed a matched case-control study to reduce false positive rates. Cherezov et al. (2018) improved malignancy prediction from 74.7% to 81.0% accuracy by implementing nodule size-specific models and using the Synthetic Minority Oversampling Technique (SMOTE) for class imbalance.

Additional high-performing models: Chae et al. (2014) used texture features on 86 part-solid ground-glass opacities to differentiate preinvasive lesions from invasive adenocarcinomas using an artificial neural network (ANN), achieving an AUC of 0.981. Liu et al. (2017) extracted semantic features from NLST baseline nodules and predicted lung cancer diagnosis 1-2 years later with an AUC of 0.80. Ardila et al. (2019) applied a DL network to NLST data, achieving an AUROC of 94.4% for lung cancer risk prediction with external validation. Dhara et al. (2016) classified malignant versus benign nodules from 891 cases using an SVM, reaching an AUC of 0.951.

Addressing overdiagnosis: Maldonado et al. (2013) developed the CANARY algorithm to categorize pulmonary nodules as aggressive or indolent, achieving a validation sensitivity of 98.7% and specificity of 63.6%. Lu et al. (2019) used features from both the tumor and the "difference region" (part-solid area) to discriminate aggressive versus indolent nodules with an AUC of 0.846. Morales et al. (2019) stratified NLST cancer patients into low, intermediate, and high risk groups with an AUC of 0.878 using just two radiomic features, one of which was associated with FOXF2 expression, a gene linked to poor prognosis.

TL;DR: Radiomic models on LDCT screening data achieve AUCs of 0.81 to 0.944 for distinguishing malignant from benign nodules. Ardila et al. reached AUROC 94.4% with deep learning. The CANARY algorithm identified aggressive nodules with 98.7% sensitivity. These tools address a major screening limitation: the high rate of indeterminate nodules and overdiagnosis.
Pages 10-13
Predicting Survival and Recurrence: Radiomic Signatures Beyond TNM Staging

Although pathologic staging remains the most important prognostic factor for lung cancer survival, there is marked variability in outcomes among patients with the same stage, suggesting other factors drive progression and recurrence. Radiomic features have shown potential to complement staging and improve prognostication. Aerts et al. (2014) analyzed NSCLC and head-and-neck patients, validating a CT radiomic signature with a concordance index (CI) of 0.65, which outperformed both TNM staging and tumor volume. Critically, they found that their most informative features correlated with cell-cycling gene-expression pathways, providing a biological basis for the imaging signal.

Novel radiomic features for overall survival: Grove et al. (2015) developed two CT features, convexity (HR = 0.31) and entropy ratio (HR = 2.36), that were significantly associated with OS in primary lung adenocarcinoma across two independent cohorts. Tunali et al. (2017) assessed the same cohorts and introduced novel features from radial gradient (RG) and radial deviation (RD) maps that also predicted OS (HR = 0.40, P = 0.014). Win et al. (2013) showed that heterogeneity on both CT and PET components of PET/CT were significant predictors of survival. Chae et al. (2014) and She et al. (2018) found CT radiomic signatures that differentiated indolent from invasive lung adenocarcinoma with AUROCs of 0.98 and 0.95, respectively.

Metastasis and recurrence prediction: Coroller et al. (2015) built a combined model of CT radiomics and clinical predictors to predict distant metastasis (CI = 0.61), while Wu et al. (2016a) used fluorine-18 PET/CT radiomic features for the same endpoint, achieving a CI of 0.71. Huang et al. (2016) identified radiomic signatures correlated with disease-free survival (HR = 1.77, CI = 0.691). Several studies by Huynh et al. (2017), Li et al. (2017), and Oikonomou et al. (2018) investigated prognostic performance of CT radiomics for distant metastasis and locoregional recurrence after stereotactic body radiation therapy (SBRT).

TL;DR: Radiomic signatures outperform TNM staging alone for lung cancer prognosis (CI = 0.65 vs. staging/volume). Key features like convexity (HR = 0.31), entropy ratio (HR = 2.36), and radial gradient maps (HR = 0.40) predict overall survival. Metastasis prediction models reach CI of 0.61 to 0.71, and adenocarcinoma aggressiveness classifiers achieve AUROCs up to 0.98.
Pages 11-16
Predicting Immunotherapy, TKI, and Chemoradiation Response with Radiomics

Immunotherapy response: Checkpoint blockade immunotherapy yields durable responses in some patients, but a substantial subset does not respond, and some experience lethal hyperprogressive disease (HPD). Trebeschi et al. (2019) developed models predicting NSCLC immunotherapy outcomes at the lesion level (AUROC = 0.83) and patient level (AUROC = 0.76). Sun et al. (2018) created a radiomic signature of 8 features to assess CD8 T-cell tumor infiltration, discriminating inflamed from immune-desert tumors with an AUC of 0.76. Tunali et al. (2019a) predicted rapid disease progression and HPD among immunotherapy-treated NSCLC patients with AUCs of 0.81 to 0.85 using a combined radiomics-clinical model. Tunali et al. (2019c) followed up with a validated clinical-radiomic risk model that identified a very high-risk group with dramatically poor survival (HR for OS = 5.35 compared to the low-risk group). Mu et al. (2019a) developed a PET/CT radiomic signature for predicting durable clinical benefit from immunotherapy with a validation AUC of 0.81.

Tyrosine kinase inhibitor (TKI) response: Only a subset of patients benefit from EGFR TKIs like Erlotinib and Gefitinib. Cook et al. (2015) used FDG-PET texture features to stratify Erlotinib-treated patients into high versus low OS groups (26.6 months vs. 13.1 months, P = 0.006). Ravanelli et al. (2018) identified CT texture features predicting 6-month progression (AUROC = 0.80) and 1-year progression (AUROC = 0.76). Park et al. (2018) found that high GLCM entropy on pretreatment FDG-PET/CT was associated with worse survival among EGFR TKI-treated patients (HR = 4.86 after adjusting for clinical parameters).

Chemotherapy and radiation therapy: Coroller et al. (2017) used CT radiomics to predict pathological complete response (pCR) after neoadjuvant chemoradiation (AUC = 0.75), significantly outperforming total tumor and lymph node volume alone (AUC = 0.58). Yu et al. (2018) validated a CT radiomic model predicting metastasis among NSCLC patients treated with surgery or SABR (HR = 1.27). Mattonen et al. (2016) developed an ML-based model to detect local recurrence after SABR with an AUC of 0.85, and a false negative rate of just 23% compared to 99% for expert physicians. Khorrami et al. (2019) showed that peritumoral CT radiomic features predicted pemetrexed-based chemotherapy response with an AUC of 0.77. Fave et al. (2017) used delta radiomics (changes in radiomic features over time) and found feature alterations after radiation therapy were associated with tumor response (C-index = 0.558).

TL;DR: Radiomics predicts immunotherapy response (AUC = 0.76-0.85), identifies HPD risk (HR = 5.35 for high-risk group), and stratifies TKI-treated patients by survival (26.6 vs. 13.1 months). For chemoradiation, radiomics predicts pCR (AUC = 0.75 vs. 0.58 for volume alone) and detects post-SABR recurrence (AUC = 0.85) far better than expert physicians (23% vs. 99% false negative rate).
Pages 14-18
Linking Imaging Features to Driver Mutations: Radiogenomics for Noninvasive Molecular Profiling

Radiogenomics studies the relationship between imaging features and genomic phenotypes such as gene expression and driver mutations. This approach is particularly valuable for patients with unresectable lung cancer, those unable to undergo biopsy, or cases where molecular testing is indeterminate. Because radiomic features can be extracted in real time and longitudinally, they could detect phenotypic transformations and acquired resistance to therapies earlier than tissue-based approaches. Importantly, a single needle biopsy only samples a small subregion of a heterogeneous tumor and may produce misleading results about the overall mutational landscape.

EGFR and KRAS mutation prediction: Rios Velazquez et al. (2017) developed a clinical-radiomics signature to differentiate EGFR from KRAS mutations, the most common somatic mutations in lung adenocarcinoma, with an AUC of 0.70 using 21 features in a cohort of 353 training and 352 validation patients. Gevaert et al. (2017) used 5 semantic features and achieved higher accuracy for EGFR mutation prediction specifically (AUC = 0.87), though their model could not accurately predict KRAS status. Liu et al. (2016) predicted EGFR mutation status (AUC = 0.709) in a surgically resected Asian cohort of 186 patients.

Rarer mutations and fusions: Weiss et al. (2014) identified CT texture features discriminating KRAS-mutant tumors from pan-wild-type tumors with 89.6% accuracy. Yamamoto et al. (2014) combined clinical covariates and CT features to characterize ALK-rearranged NSCLC. Yoon et al. (2015) used clinical, CT, and PET radiomics to predict ALK/ROS1/RET fusion-positive lung adenocarcinoma with a sensitivity of 0.73 and specificity of 0.70 in a cohort of 539 patients. Zhou et al. (2018) combined semantic CT features with next-generation RNA sequencing data, validating 10 metagenes that were significantly associated with semantic imaging features through functional gene-enrichment analysis.

Tumor histology: Wu et al. (2016b) identified CT texture features associated with NSCLC tumor histologic subtypes (AUC = 0.72), demonstrating that radiomics can characterize not only molecular mutations but also fundamental tissue-level differences between tumor subtypes. This supports radiomics as a complement to traditional pathology when tissue is unavailable or limited.

TL;DR: Radiogenomics predicts EGFR mutations with AUCs of 0.70 to 0.87, KRAS status with 89.6% accuracy, and ALK/ROS1/RET fusions with 73% sensitivity/70% specificity. These noninvasive approaches are especially valuable when biopsy is infeasible, molecular tests are indeterminate, or longitudinal monitoring of acquired resistance is needed.
Pages 17-19
Reproducibility Barriers, Study Design Flaws, and the Path to Clinical Translation

Image acquisition heterogeneity: Standard-of-care acquisition parameters vary widely, including pixel spacing, slice thickness, reconstruction kernel, kVp, PET washout periods, contrast agent administration, MRI echo time, and repetition time. Intra- and interscanner variabilities cause radiomic feature distributions to shift. The authors recommend either standardizing acquisition protocols or using features that are less sensitive to these parameters. The IBSI provides standardized algorithms, consensus definitions, and benchmarks for radiomic feature calculation to address part of this interoperability problem.

Segmentation reproducibility: Manual segmentations are time-consuming and lead to intraobserver variation. Even with semiautomated algorithms, some features are not reproducible when acquired within minutes using the same parameters (Balagurunathan et al. 2014b) or the same segmentation algorithms (Kalpathy-Cramer et al. 2016a). The authors recommend using test-retest datasets such as RIDER and stability testing with multiple segmentation runs (e.g., Moist run) to identify and select only reproducible features.

Statistical pitfalls: With potentially unlimited numbers of radiomic features (due to tunable hyperparameters, filter types, and feature categories), many studies analyze large feature sets without accounting for multiple testing errors, leading to selection bias and false positives. Chalkidou et al. (2015) recommends a minimum of 10 to 15 observations per predictor variable to reduce false discovery rates. The authors advocate for Bonferroni-Holm or Benjamini-Hochberg corrections for multiple comparisons. Single-institution studies should use repeated cross-validation, and the optimal approach is external validation on independent cohorts. A key open question is whether validated models represent "pan-signatures" applicable across patient populations or are specific to particular treatment contexts.

Biology and clinical translation: Linking radiomics to biology is critical for moving beyond statistical associations. Although several studies have connected radiomic features to gene-expression patterns (Aerts et al. 2014; Grossmann et al. 2017; Morales et al. 2019), the biological underpinnings of most radiomic signatures remain unknown due to lack of datasets containing both imaging and genomic information. The Radiomics Quality Score (RQS), developed by Lambin et al. (2017), evaluates studies on internal consistency, reproducibility, and clinical applicability. Despite the promise demonstrated across screening, prognosis, and treatment response, no radiomic model had yet impacted clinical practice as of 2021. The authors conclude that standardization of imaging and features, combined with emerging deep learning technologies and prospective clinical trials, will be needed to bridge this translational gap.

TL;DR: Key barriers include acquisition heterogeneity across scanners, poor segmentation reproducibility (Dice often below 80%), and statistical overfitting from testing hundreds of features without multiple-comparison corrections. The Radiomics Quality Score (RQS) assesses study rigor. As of 2021, no radiomic model had entered clinical practice; prospective trials, IBSI standardization, and biological validation of signatures are needed to bridge this gap.
Citation: Tunali I, Gillies RJ, Schabath MB.. Open Access, 2021. Available at: PMC8288444. DOI: 10.1101/cshperspect.a039537. License: Open Access.