Deep Learning for Lung Cancer Screening and Diagnosis via CT

Plain-English Explanations

Overview & Background

Pages 1-3

Why Deep Learning Is Needed for CT-Based Lung Cancer Detection

Lung cancer accounts for 18% of all cancer-related deaths globally, making it the single deadliest cancer worldwide. Individuals diagnosed with lung cancer have only a 10-20% probability of surviving five years after diagnosis, largely because the disease is often caught at an advanced stage when symptoms finally become apparent. Computed tomography (CT) imaging has become the preferred screening and diagnostic modality because it provides highly detailed cross-sectional images of the lungs capable of revealing small nodules that other modalities miss. However, the sheer volume of CT data, combined with the visual complexity of distinguishing benign from malignant nodules, places an enormous burden on radiologists.

The role of deep learning: Traditional machine learning methods have struggled with the variability across medical images from different patients and scanners. Deep learning (DL) techniques, which use neural networks with multiple hidden layers, can automatically learn complex hierarchical features from raw CT data. The authors categorize DL models into four groups: supervised models (CNN, LSTM, RNN, GRU), unsupervised models (Auto-Encoders, Restricted Boltzmann Machines), semi-supervised models (GANs, RNN variants), and reinforcement learning models. Among these, convolutional neural networks (CNNs) have become the dominant architecture for medical image analysis due to their location invariance and hierarchical feature extraction capabilities.

Scope of this review: This paper provides a comprehensive review of deep learning methods applied to lung cancer screening and diagnosis using CT images. It covers both classification (determining whether a nodule is benign or malignant) and segmentation (delineating the precise boundaries of nodules and lung regions). The review spans work from 2018 to 2023, summarizing over 40 referenced models, comparing their performance on benchmark datasets like LIDC-IDRI, LUNA16, and NLST, and identifying key research gaps. The authors are affiliated with Universiti Kebangsaan Malaysia and Universiti Teknologi MARA, and the research was funded under grant DPK-2023-005.

TL;DR: Lung cancer causes 18% of all cancer deaths and has only 10-20% five-year survival. CT imaging is the preferred screening tool, but manual interpretation is slow and error-prone. This review covers 40+ deep learning models (2018-2023) for CT-based lung cancer classification and segmentation, comparing results across LIDC-IDRI, LUNA16, and NLST benchmarks.

CT Imaging & Deep Learning Pipeline

Pages 3-5

How Deep Learning Processes CT Images for Lung Cancer

The review describes the complete pipeline from raw CT acquisition to cancer diagnosis. CT imaging works by measuring X-ray attenuation as beams pass through body tissues, producing cross-sectional images that can be reconstructed into 3D volumes. The advantages of CT include its non-invasive nature, rapid processing time, and ability to reveal internal abnormalities invisible to conventional imaging. However, CT has significant drawbacks: ionizing radiation exposure that can itself increase cancer risk, and a tendency to produce false-positive results that lead to unnecessary follow-up procedures.

2D vs. 3D approaches: Deep learning algorithms for lung CT analysis fall into two main categories. 2D approaches (using standard CNNs, RNNs, and hybrid models) analyze individual CT slices to detect nodules and anomalies. 3D approaches use volumetric CNNs with sliding cubes instead of sliding windows to process entire CT volumes, providing more comprehensive analysis of the lung. Many researchers have adopted hybrid approaches that combine both 2D and 3D CNN architectures to analyze individual slices alongside full volumes, aiming to capture both fine-grained slice-level detail and volumetric spatial relationships.

Transfer learning acceleration: To address the challenge of limited labeled medical data, many algorithms employ transfer learning, pretraining models on massive general image datasets (such as ImageNet) before fine-tuning them for lung cancer-specific tasks. Pre-processing steps are also critical, with Gaussian and Median filtering commonly used to reduce noise, followed by candidate detection methods such as region proposal networks and sliding window approaches to identify areas of concern within each image.

The annotation bottleneck: Training deep learning models requires large, high-quality, accurately labeled datasets, but obtaining expert annotations for CT scans is difficult and time-consuming, especially for complex segmentation tasks. The scarcity of labeled data constrains model training and evaluation, while inconsistent annotation criteria across datasets introduce biases. Despite these hurdles, deep learning has demonstrated the ability to perform real-time image analysis and improve the consistency of CT interpretation over manual assessment.

TL;DR: CT imaging provides non-invasive cross-sectional lung images but carries radiation risk and false-positive rates. DL pipelines use 2D CNNs (per-slice analysis), 3D CNNs (volumetric analysis), or hybrid approaches. Transfer learning from ImageNet helps overcome limited medical training data. Pre-processing includes Gaussian/Median filtering and region proposal networks for candidate detection.

Detection Models

Pages 6-9

Computer-Assisted Lung Cancer Detection: Model Architectures and Results

The review catalogs a wide range of CNN-based detection models with their reported performance metrics. A 3D CNN with three modules achieved 84.4% sensitivity for nodule recognition and classification, outperforming manual assessment. Nasser and Naser (2019) reached 96.67% accuracy using an artificial neural network (ANN). Cifci's SegChaNet model, combining Deep Learning with Instantaneously Trained Neural Networks (DITNN) and improved profuse clustering technique (IPCT), achieved 98.42% accuracy for lung cancer diagnosis. A double convolutional deep neural network (CDNN) achieved accuracy of 0.909 and 0.872 for nodule recognition and classification, respectively.

Transfer learning and multi-group approaches: An Inception-v3 transfer learning model attained a sensitivity rate of 95.41% for lung image classification, demonstrating the value of pretraining on large general datasets. Jiang et al. (2018) proposed a multi-group patch-based learning system that detected lung cancer with 80.06% sensitivity at 4.7 false positives per scan, rising to 94% sensitivity at 15.1 false positives per scan. The DL-CAD system by Li et al. (2018) achieved 86.2% accuracy for detecting and characterizing lung nodules smaller than 3 mm using the LIDC-IDRI and NLST datasets.

False-positive reduction: A deep 3D residual CNN with a spatial pooling and cropping (SPC) layer, using a 27-layer network on the LUNA-16 dataset, achieved 98.3% sensitivity specifically for reducing false positives in nodule detection. In contrast, Teramoto et al. (2017) trained a DCNN on only 76 cancer cases and achieved just 71% classification accuracy, illustrating how small datasets severely limit model performance. A 3D CNN tested on LUNA16 achieved 94.4% sensitivity for nodule detection using multiple convolutional, max-pooling, fully connected, and softmax layers.

State-of-the-art results (2022-2023): Recent models have pushed accuracy even higher. Vani et al. (2023) compared six deep learning models (CNN, CNN GD, Inception V3, ResNet-50, VGG-16, VGG-19) and found CNN with Gradient Descent outperformed all others at 97.86% accuracy, 96.79% sensitivity, and 97.40% specificity. Shalini et al. (2023) combined 3D CNN with RNN for 95% accuracy in classifying cancerous nodules. Most impressively, Abunajm et al. (2023) introduced a CNN-based model that achieved 99.45% accuracy for early lung cancer prediction while successfully reducing false positives, using the IQ-OTH/NCCD-lung cancer dataset from Kaggle.

TL;DR: Detection models span wide accuracy ranges: 71% (small 76-case dataset) to 99.45% (Abunajm 2023). Key results include 98.3% sensitivity for false-positive reduction (3D residual CNN on LUNA-16), 97.86% accuracy (CNN GD), 96.67% (ANN), 95.41% sensitivity (Inception-v3 transfer learning), and 86.2% for sub-3mm nodule detection (DL-CAD on LIDC-IDRI/NLST).

Datasets & Benchmarks

Pages 10-13

Benchmark Datasets and Imaging Modality Comparisons

The review identifies 10 distinct datasets used across the surveyed studies, each with different characteristics that affect model training and evaluation. LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative) is the most widely used, providing annotated CT images for both nodule detection and classification tasks. LUNA16 is derived from LIDC-IDRI and serves as a standardized benchmark challenge. NLST (National Lung Screening Trial) provides low-dose CT data and is commonly used for evaluating early detection models. ImageNet serves as the pretraining backbone for transfer learning approaches.

Additional datasets: The Tianchi AI dataset has been used in several Chinese studies for detection system development. The Cancer Imaging Archive (CIA) dataset provides publicly accessible cancer imaging data. Other datasets include immunotherapy response datasets, PD-L1 expression datasets, and various private institutional collections. The authors note that the diversity in dataset size, imaging modalities, annotation quality, and lung cancer case representation creates both opportunities for model development and significant challenges for generalizability.

Imaging modality comparison: The review provides a structured comparison of five imaging methods. CT offers high resolution and sensitivity for early-stage malignancies but carries high radiation dose and cost. X-ray is quick and affordable but has limited sensitivity and specificity, often missing early-stage cancers. Ultrasound is non-invasive and radiation-free but is operator-dependent and limited in scanning lung parenchyma. MRI provides excellent soft tissue contrast with low radiation but suffers from long scan times and high cost. PET-CT combines anatomical and functional information with high sensitivity for cancer detection but produces false positives from inflammation and requires fasting before scans.

Dataset heterogeneity problem: A key challenge highlighted by the review is that studies use different datasets, different performance metrics, and different validation approaches, making direct model comparisons difficult. Some studies report accuracy, others sensitivity, others AUC. Some use public benchmarks while others rely on proprietary institutional data. This inconsistency prevents meta-analytical pooling of results and complicates the identification of truly superior approaches for clinical deployment.

TL;DR: Ten datasets are commonly used, with LIDC-IDRI and LUNA16 as the most prevalent benchmarks. CT imaging leads for lung cancer detection (high resolution, early-stage capability) but carries radiation risk. The NELSON trial showed 85% selectivity and 99% specificity for LDCT screening. Dataset heterogeneity across studies prevents direct model-to-model comparison and meta-analysis.

Evolving Detection Techniques

Pages 14-16

From Sputum Cytology to CNNs: The Evolution of Lung Cancer Detection

The review traces the evolution of lung cancer detection techniques, highlighting how deep learning is transforming a landscape that previously relied on imprecise methods. Sputum cytology, which examines coughed-up samples for malignant cells, has a sensitivity of only 20-30% for early lung cancer and is insufficient for detecting small adenocarcinomas under 2 cm in diameter. White light bronchoscopy (WLB) is the standard histological diagnostic method but has limitations in identifying pre-malignant lesions. Tissue biopsy remains the gold standard for confirming malignancy but is invasive, costly, and prone to errors requiring repeated procedures.

The LDCT screening revolution and its problems: Low-dose CT (LDCT) screening has been a major advance, with the NELSON trial demonstrating 85% selectivity and 99% specificity. However, 96% of all positive screens in the National Lung Screening Trial (NLST) were found to be false positives, measured across over 40% of all individuals with at least one positive screen. This staggering false-positive rate leads to unnecessary invasive procedures and significant patient anxiety, creating a clear need for AI-assisted tools that can reduce false positives while maintaining high sensitivity.

DL-based chest radiograph analysis: CNN-based models applied to chest radiographs have reported sensitivities in the range of 0.51-0.84 with mean false-positive indications per image (mFPI) of 0.02-0.34. These CAD models have improved radiologists' ability to detect nodules compared to screening without AI assistance. However, distinguishing between benign and malignant nodules remains challenging because normal anatomical structures can resemble healthy nodules, and even experienced radiologists make errors in these condition-based diagnostic challenges.

Detection vs. segmentation paradigms: The two primary deep learning approaches for lesion identification are detection (classifying a region as a single label) and segmentation (classifying individual pixels). Segmentation provides more precise information, down to each pixel's label, and can improve the likelihood of successful diagnosis. Pixel-level classification enables monitoring of changes in lesion size and shape over time, provides information on both the area and the long/short diameters of lesions, and supports more effective treatment response evaluation.

TL;DR: Sputum cytology has only 20-30% sensitivity for early lung cancer. LDCT screening achieves 85% selectivity and 99% specificity (NELSON), but 96% of positive NLST screens were false positives. CNN models on chest radiographs reach 0.51-0.84 sensitivity with 0.02-0.34 mFPI. Segmentation approaches offer pixel-level precision superior to region-based detection for treatment monitoring.

Segmentation

Pages 17-18

Lung and Nodule Segmentation: From Thresholding to U-Net

The review covers the progression of segmentation techniques from simple threshold-based methods to sophisticated deep learning architectures. Early lung segmentation used straightforward numerical approaches, gray-level thresholding, and shape-based methods to separate lung tissue from surrounding structures. Brown et al. developed an automated knowledge-based segmentation system using anatomical knowledge (estimated volume, shape, relative position, X-ray attenuation) to extract useful data from CT images. Hu et al. created a fully automatic 3D lung segmentation method with only 0.8-pixel root mean square difference between computer and human analysis.

Classical segmentation results: A fully automated approach using slice-based pixel-value thresholds with size, circularity, and location criteria achieved 94.0% segmentation accuracy on 2,969 thick-slice images and 97.6% on 1,161 thin-slice images across 101 CT cases. Level-set approaches have been applied to both joint segmentation-registration tasks and lung nodule delineation. A parameter-free segmentation technique targeting juxtapleural nodules achieved a 92.6% re-inclusion rate on 403 juxtapleural nodules from the LIDC dataset. A global optimal hybrid geometric active contour method reached an average F-measure of 99.22% on 40 CT scans.

Deep learning segmentation models: U-Net architectures have become the backbone of modern lung segmentation. A U-Net with contracting path for low-level features and expanding path for high-level information achieved a dice coefficient of 0.9502. A Residual U-Net with false-positive reduction approach improved segmentation by incorporating residual units for better feature extraction. Mask R-CNN combined with supervised and unsupervised techniques achieved 97.68% segmentation accuracy with an average runtime of only 11.2 seconds. Setio et al. developed a multi-view convolutional network achieving 85.4% detection sensitivity at 4 false positives per scan on LIDC-IDRI.

Advanced segmentation strategies: Roy et al. proposed a synergistic combination of deep learning and shape-driven level sets, using coarse segmentation maps from deep fully convolutional networks followed by fine segmentation through level-set shape-driven evolution. For 3D approaches, a fully convolutional network (FCN) built from a 3D CNN enabled rapid score map generation for entire volumes in a single forward pass, proving effective for quickly producing candidate regions of interest.

TL;DR: Segmentation accuracy has progressed from 94.0% (threshold-based on thick slices) to 97.68% (Mask R-CNN in 11.2 seconds). U-Net achieved dice coefficient of 0.9502. Multi-view ConvNets reached 85.4% sensitivity at 4 FP/scan on LIDC-IDRI. Classical methods achieved 99.22% F-measure (active contour on 40 scans) and 92.6% re-inclusion for juxtapleural nodules.

Classification

Pages 18-20

Nodule Classification: Binary, Ternary, and Multi-Class Approaches

The classification section of the review examines how deep learning models determine whether detected nodules are benign or malignant, and in some cases, classify cancer subtypes or staging. The majority of previous research focused on binary classification (benign vs. malignant), but more sophisticated multi-class approaches have emerged. Liu et al. proposed a Multi-view CNN (MV-CNN) for both binary and ternary classifications of lung cancer, demonstrating that multi-view strategies consistently outperformed single-view approaches across both classification types.

Genetic algorithm-enhanced classification: Da Silva and da Silva combined deep learning with genetic algorithms for nodule classification, achieving 94.66% sensitivity, 95.14% specificity, 94.78% accuracy, and an AUC of 0.949 on the LIDC-IDRI database. Dey et al. developed a binary classifier using a four-pathway CNN architecture incorporating a basic 3D CNN, a novel multi-output network, a 3D DenseNet, and an upgraded 3D DenseNet with multi-outputs. Validated on LIDC-IDRI, this approach outperformed the majority of existing methods.

FDG-PET and CT fusion: A CNN-based classifier using two neural networks for extraction and classification of cancer features from FDG-PET and CT images aimed to differentiate between T1-T2 and T3-T4 staging classes. This approach yielded accuracy of 90%, recall of 47%, specificity of 67%, and AUC of 0.68. The relatively low recall highlights the difficulty of staging classification compared to simple malignancy detection. The performance comparison table in the review shows accuracy ranging from 95% (deep neural network lung segmentation) to 99.52% (deep learning prediction framework), with most models clustering between 94% and 98%.

Identified classification challenges: The review identifies three key issues with current classification algorithms: (a) binary lung classification algorithms typically do not meet the diagnostic standards required by radiologists and oncologists; (b) ROI patch-based methods are time-consuming and require manual expert involvement; and (c) standard image processing algorithms frequently fail to segment lung nodules accurately, which directly undermines downstream classification accuracy. These findings suggest that integrated end-to-end systems, rather than modular pipelines, may be needed for clinical-grade performance.

TL;DR: Classification performance ranges from 94.78% accuracy (DL + genetic algorithms, AUC 0.949 on LIDC-IDRI) to 99.52% (deep learning framework). Multi-view CNNs outperform single-view for both binary and ternary classification. FDG-PET/CT fusion for staging achieved 90% accuracy but only 47% recall. Key gaps: binary classifiers do not meet radiologist standards, and ROI patch methods require too much manual input.

Limitations

Pages 20-21

Pre-Processing Challenges, Data Heterogeneity, and Generalizability Gaps

The review identifies a comprehensive set of limitations affecting the entire deep learning pipeline for CT-based lung cancer diagnosis. Data variability: CT scans vary greatly in resolution, slice thickness, contrast, and noise levels due to differences in imaging equipment and techniques across institutions. Pre-processing procedures must handle these disparities to produce consistent results, but failure to do so leads to degraded model performance and poor generalizability when deployed at new sites with different scanners.

Artifact contamination: CT images frequently contain motion artifacts, beam hardening artifacts, and metal artifacts that compromise image quality. These artifacts generate inconsistencies and distortions that make accurate feature extraction challenging, requiring robust artifact identification and repair algorithms. However, most reviewed models were trained on relatively clean datasets and may not perform well on artifact-heavy clinical images encountered in routine practice.

Label scarcity and annotation inconsistency: Deep learning models require substantial labeled data for training, but obtaining expert annotations for CT scans is difficult, expensive, and time-consuming, particularly for complex segmentation tasks. The review notes that different annotation techniques and criteria across datasets generate differences and biases in the training process. This problem is compounded by the fact that many studies relied on private datasets whose annotations cannot be independently verified or compared.

Computational cost and population bias: CNN-based models are computationally expensive, requiring significant computing power for both training and inference. Pre-processing steps like resizing, normalization, and augmentation must balance preserving useful information against computational demands. Additionally, most studies focused on specific populations or datasets that may not represent the diversity of patients and imaging practices worldwide. Models trained predominantly on data from one demographic or imaging protocol may fail when applied to different populations, raising concerns about equitable AI deployment in lung cancer screening.

TL;DR: Major limitations include CT scan variability across scanners (resolution, slice thickness, contrast, noise), motion and metal artifacts, scarcity of labeled training data, inconsistent annotation criteria across datasets, high computational costs for CNN training/inference, and population bias from non-diverse training cohorts. Most models lack validation on artifact-heavy real-world clinical images.

Conclusions & Future Directions

Pages 21-23

Toward Multi-Modal, Cloud-Enabled Lung Cancer Diagnosis Systems

The review concludes with a detailed roadmap for future research spanning seven specific directions. Multi-modal data integration: The authors stress the need for lung image data from additional imaging modalities beyond CT, including MRI and ultrasound, as well as the disclosure of private datasets to enable comparison and research collaboration. Combining deep features from lung scan images with patient medical history and genetic reports could yield more precise diagnoses through an all-encompassing approach rather than relying on imaging alone.

Segmentation of large solid nodules: The authors specifically identify the segmentation of big solid nodules as a difficult, under-researched task requiring more investigation. They also recommend developing a lung cancer detection model capable of distinguishing between early benign nodules and small malignant lesions, which would dramatically improve early identification and treatment. This fine-grained differentiation at early stages remains one of the hardest unsolved problems in the field.

Image quality enhancement: The review suggests using various pre-processing techniques and filters to improve image quality, including edge-preserving methods and harmony search algorithms to enhance grayscale image quality. Better pre-processing could improve downstream classification and segmentation accuracy without requiring changes to the core deep learning architecture. Standardized pre-processing pipelines that can handle CT picture variability across scanners and institutions are identified as an urgent need.

Cloud computing and novel architectures: Inspired by successful proposals for remote lung cancer detection, the authors suggest investigating cloud computing technology for machine learning-based remote diagnosis, which would enable processing and analysis of large volumes of medical data. They also recommend exploring cat swarm-optimized deep belief networks for feature extraction from lung medical images, as this approach may offer improved performance in feature extraction and classification tasks compared to standard CNN pipelines. The overarching message is that the field needs to move from isolated model development toward integrated, validated, multi-modal systems that can function reliably across diverse clinical settings.

TL;DR: Future directions include: (1) multi-modal data beyond CT (MRI, ultrasound, genetic data), (2) better segmentation of large solid nodules and early benign vs. malignant differentiation, (3) standardized pre-processing for cross-scanner consistency, (4) cloud computing for remote diagnosis, (5) novel architectures like cat swarm-optimized deep belief networks, and (6) disclosure of private datasets for research collaboration. No model has yet been validated for routine clinical deployment.

A Review of Deep Learning Techniques for Lung Cancer Screening and Diagnosis Based on CT Images

Original Paper (PDF)