Establishment and application of an artificial intelligence diagnosis system for pancreatic cancer

PMC 2020 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Page 1
Why Pancreatic Cancer Needs Faster, More Objective Diagnosis

Pancreatic cancer is one of the most lethal malignancies of the digestive system. It progresses rapidly, metastasizes early, and carries an extremely poor prognosis. Surgery remains the primary curative strategy, but many patients are already at an advanced stage by the time they receive a diagnosis because the disease lacks specific clinical symptoms and reliable serological markers. Early detection and accurate pre-operative staging are therefore critical to improving cure rates.

The role of CT imaging: Contrast-enhanced helical CT has become the standard imaging modality for pancreatic cancer because it avoids the overlapping structures seen in other imaging techniques. Dual-phase or triple-phase scanning with intravenous contrast allows specialists to evaluate tumor location, lymph node involvement, vascular invasion, and distant metastasis. However, conventional CT interpretation requires specialists to manually compare sequential image slices, a process that is both tedious and heavily dependent on the individual physician's experience and skill level.

The case for AI automation: The authors argue that an automated imaging processing system could reduce subjective variability, speed up diagnosis, and produce consistent results across different clinical settings. Deep learning has already demonstrated accuracy exceeding that of experienced physicians for lung, skin, prostate, breast, and esophageal cancer image recognition. This study extends those capabilities to pancreatic cancer using a Faster Region-based Convolutional Neural Network (Faster R-CNN).

The study was conducted at the Affiliated Hospital of Qingdao University in collaboration with Beihang University's State Key Laboratory of Virtual Reality Technology and Systems. It enrolled 338 patients with pathologically confirmed pancreatic ductal adenocarcinoma between January 2010 and January 2017, generating a database of 6,084 contrast-enhanced CT images.

TL;DR: Pancreatic cancer is often diagnosed too late for surgery. This study built a Faster R-CNN AI system trained on 6,084 CT images from 338 patients at Qingdao University Hospital to automate and accelerate pancreatic cancer detection from contrast-enhanced CT scans.
Pages 2-3
Patient Selection, CT Protocol, and Database Construction

All enrolled patients had pathologically confirmed pancreatic ductal adenocarcinoma and underwent 64-slice contrast-enhanced helical CT before surgery. The inclusion criteria required signed informed consent, pre-operative CT examination, post-operative pathological confirmation with TNM staging, and detailed clinical and pathological records. Patients were excluded if they had malignancies elsewhere in the body, were pancreatic cancer-free on pathology, were allergic to contrast agents, or had heart, liver, or kidney insufficiencies.

CT acquisition parameters: Scans were performed on a 64-slice Toshiba CT scanner with 200-mA tube current, 120-kV tube voltage, and 0.625-mm slice thickness. Three-phase dynamic contrast-enhanced scanning was performed from the diaphragm to the duodenum. Eighty milliliters of non-ionic ioversol was injected as contrast agent, with arterial-phase images acquired at 35 seconds and venous-phase images at 65 seconds post-injection. Images were reconstructed at 2-mm spacing in transverse, coronal, and sagittal planes using Vitrea software.

Database composition: From 338 patients, a total of 6,084 images were collected (averaging 15 to 20 images per patient). These were split into a training set of 4,385 images from 238 patients and a verification set of 1,699 images from 100 patients. Senior radiologists with more than 5 years of experience labeled tumor locations in each training image. The AI system learned to distinguish cancer images from normal pancreatic tissue, chronic pancreatitis, and benign tumors by comparing labeled and unlabeled regions.

Patient demographics: Among the 338 patients, 213 were male and 125 were female (ratio 1.7:1). Tumors were located in the pancreatic head in 222 cases and the body or tail in 116 cases. Low, moderate, and high differentiation grades were found in 175, 105, and 58 cases respectively. TNM staging showed 104 cases at stage I-II, 186 at stage III, and 48 at stage IV. There were no statistically significant differences between the training and verification groups for sex, age, tumor location, differentiation grade, or TNM stage (all P > 0.05).

TL;DR: 338 patients, 6,084 CT images split into 4,385 training and 1,699 verification images. 64-slice CT with three-phase contrast enhancement. No significant demographic differences between training and verification groups. Senior radiologists labeled all training data.
Pages 3-4
Faster R-CNN: A Two-Stage Object Detection Framework

The authors chose Faster R-CNN, a two-stage object detection method, over one-stage approaches. One-stage methods (such as YOLO or SSD) directly divide images into grids and predict bounding boxes through regression. Two-stage methods first generate candidate regions and then classify and refine them. Although two-stage methods require more training time, they produce more accurate detection and classification results because they balance positive and negative sample proportions during the candidate screening step.

Three core components: The Faster R-CNN architecture used in this study consists of three parts. First, a feature extraction network based on VGG16 (pre-trained on ImageNet, containing 13 convolutional layers and 3 fully connected layers) generates convolutional feature maps from the CT images. Second, a Region Proposal Network (RPN) slides a 3x3 window across the feature map, mapping each position to a 256-dimensional feature vector that feeds into two sibling fully connected layers, one for bounding box regression (coordinates) and one for classification (object vs. not-object probability scores).

Anchor mechanism: At each sliding window position, nine anchors are generated using three scales (128x128, 256x256, 512x512) and three aspect ratios (0.5, 1, 2). Positive labels are assigned to anchors with the highest intersection-over-union (IoU) overlap with a ground truth box, or any anchor with IoU greater than 0.7. Anchors with IoU less than 0.3 for all ground truth boxes receive negative labels. Non-maximum suppression then merges neighboring regions to reduce redundant proposals before final classification and regression.

ROI pooling and output: The convolutional feature map shared by the RPN and the feature extraction network passes through an ROI pooling layer, producing fixed-length feature vectors. These vectors feed into two final fully connected layers: one determines whether each region contains a tumor and outputs a probability score, and the other performs fine regression on the bounding box coordinates to improve localization accuracy.

TL;DR: Faster R-CNN uses VGG16 for feature extraction, a Region Proposal Network with 9 anchors per position (3 scales, 3 aspect ratios), and ROI pooling for final classification. Positive anchors require IoU > 0.7 with ground truth. The two-stage design prioritizes accuracy over speed during training.
Pages 4-5
Four-Step Iterative Training and Hyperparameter Configuration

The Faster R-CNN underwent a four-step alternating training procedure. In step 1, sequential contrast-enhanced CT images (arterial, venous, and delayed phases) with labeled pancreatic cancer regions were fed into the network, generating convolutional feature maps. The RPN parameters were adjusted according to these maps, and lymph node metastasis information was labeled to complete one round of RPN training and create ROI feature vectors.

Steps 2 through 4: In step 2, the proposals generated by the RPN were fed into the classification and regression layers initialized with ImageNet pre-trained weights, with no parameter sharing between the two networks at this point. In step 3, a new RPN was initialized using the classification and regression parameters from step 2, but the shared convolutional layers had their learning rates set to zero so only the RPN-unique layers were updated. In step 4, the shared convolutional layers remained fixed while the proposals from step 3 were used to fine-tune the classification and regression layers.

Training hyperparameters: The first step included 80,000 RPN training sessions with a learning rate of 0.0001 for the first 60,000 sessions and 0.00001 for the remaining 20,000. The second step included 40,000 classification/regression training sessions with a learning rate of 0.0001 for the first 30,000 and 0.00001 for the remaining 10,000. This entire process was then repeated. A momentum of 0.9 and weight decay of 0.0005 were used throughout. ROI pooling layer weights and classification/regression layer weights were initialized with zero-mean Gaussian distributions (standard deviation of 100).

Training was optimized via stochastic gradient descent (SGD) with end-to-end backpropagation. The loss function converged after approximately 240,000 training iterations, as shown by the loss curve in the paper. Transfer learning from VGG16 pre-trained on ImageNet allowed effective feature extraction despite the relatively small dataset of pancreatic CT images.

TL;DR: Four-step alternating training with 80,000 RPN sessions and 40,000 classification sessions per round, repeated twice. Learning rates dropped from 0.0001 to 0.00001 during each step. SGD with momentum 0.9 and weight decay 0.0005. Network converged after 240,000 iterations. Transfer learning from ImageNet-pretrained VGG16 enabled training on a modest dataset.
Pages 5-6
Precision-Recall Performance and Internal Testing

To evaluate the training effect, the authors input 1,699 sequential CT images randomly selected from the training database into the trained detection model. They recorded precision and recall rates for the nodule class during training and plotted a precision-recall (PR) curve. The area under the PR curve, which equals the average precision (AP), was 0.7664. Because this experiment involved only a single class (pancreatic nodule), the mean average precision (mAP) was also 0.7664, indicating a good training effect.

Detection results on training data: When the Faster R-CNN's detections were compared against the expert-labeled ground truth in the training database, 230 out of the tested images contained identified node regions with probability scores higher than 0.7. Of these, 210 images showed a bounding box overlap rate greater than 0.7 with the ground truth labels created by imaging specialists. This indicates strong spatial agreement between the AI's detected regions and the manually annotated tumor locations.

The mAP metric captures both precision and recall in a single number. A value of 0.7664 means the model correctly identifies and localizes pancreatic tumors with reasonable accuracy across different confidence thresholds. While not perfect, this training-phase result established a solid foundation for clinical verification on unseen patient data.

TL;DR: Training evaluation yielded a mean average precision (mAP) of 0.7664 on the PR curve. Of images with detection probability above 0.7, 91.3% (210/230) had bounding box overlap greater than 0.7 with expert-labeled ground truth.
Pages 6-7
AUC of 0.9632 on 100 Unseen Pancreatic Cancer Patients

For clinical validation, sequential contrast-enhanced CT images from 100 pancreatic cancer patients (1,699 images total) were fed into the trained Faster R-CNN model. The final ground truth diagnosis for each case was established by three imaging specialists who rigorously analyzed the CT images alongside pathological evidence. In cases of disagreement, consensus was reached through discussion.

ROC analysis: The AI system's detection results were classified into true positive (TP), false positive (FP), true negative (TN), and false negative (FN) categories. At varying probability thresholds, the true positive rate (TPR) and false positive rate (FPR) were calculated to construct a receiver operating characteristic (ROC) curve. The area under the ROC curve (AUC), calculated using the trapezoidal rule, was 0.9632. An AUC above 0.9 is considered high accuracy, confirming the strong diagnostic capability of the trained system.

Speed advantage: The Faster R-CNN processed each CT image in approximately 0.2 seconds. With an average of 15 images per patient, the total AI-assisted diagnosis time was roughly 3 seconds per patient. By comparison, an imaging specialist required approximately 8 minutes per patient. This represents a roughly 160-fold speed improvement, making the system highly practical for clinical workflows where rapid turnaround is essential.

The clinical verification confirmed that the system maintained its discriminative performance on completely unseen data, with the high AUC demonstrating that the Faster R-CNN can reliably distinguish pancreatic cancer regions from normal tissue across a range of cancer stages (early, middle, and late-stage disease were all represented in the database).

TL;DR: Clinical validation on 100 patients (1,699 images) produced an AUC of 0.9632 on the ROC curve. AI diagnosis took about 3 seconds per patient vs. 8 minutes for a specialist, a 160-fold speed improvement.
Pages 7-8
Single-Center, Retrospective Design and Missing Control Groups

Single-center retrospective design: All data came from one institution (Affiliated Hospital of Qingdao University), and the study was retrospective in nature. This limits the generalizability of the findings because the CT acquisition protocols, patient demographics, and scanner hardware may differ substantially at other centers. Models trained on data from a single scanner vendor (Toshiba) may not perform equivalently on images from GE, Siemens, or Philips equipment.

No non-cancer controls: The verification group included only patients with confirmed pancreatic cancer. No patients with benign pancreatic lesions (such as chronic pancreatitis, serous cystadenomas, or intraductal papillary mucinous neoplasms) or healthy individuals were included. This is a significant limitation because the real clinical challenge is distinguishing cancer from benign conditions, not simply detecting cancer in patients already known to have it. The reported AUC of 0.9632 may therefore overestimate real-world diagnostic performance.

Intended as an assistive tool: The authors explicitly state that the AI system is designed to aid radiologists, not replace them. While the system's speed advantage is impressive, the absence of head-to-head comparisons between the AI and individual radiologists on the same test set makes it difficult to quantify the incremental clinical benefit. The study also did not assess how radiologist performance changes when assisted by the AI output.

Additionally, the study did not evaluate the system's ability to perform staging, only detection. Given that accurate TNM staging is critical for surgical planning in pancreatic cancer, a detection-only system addresses only part of the clinical workflow.

TL;DR: Key limitations include single-center retrospective design, no benign or healthy controls in the test set, no direct AI-vs-radiologist comparison on the same data, and no staging capability. Results may overestimate real-world accuracy.
Pages 8-9
Prospective Multicenter Validation and Expanded Training Data

Multicenter prospective studies: The authors plan to conduct a prospective study based on multicenter clinical data to further validate the clinical application of Faster R-CNN for pancreatic cancer diagnosis. Moving from a single-center retrospective design to a multicenter prospective trial would address the most critical limitation of the current work and provide stronger evidence for clinical adoption.

Expanded inclusion criteria: Future work will reorganize the training and testing groups to include not only patients with pancreatic cancer but also patients with benign pancreatic lesions and healthy controls. This is essential for training the system to handle the full spectrum of pancreatic pathology seen in real clinical practice, where the differential diagnosis between cancer and conditions like chronic pancreatitis or autoimmune pancreatitis can be extremely challenging.

Broader implications: The study demonstrates that Faster R-CNN, originally developed for general-purpose object detection in computer vision, can be effectively adapted to medical image analysis through transfer learning. The VGG16 backbone pre-trained on ImageNet provided a strong initialization that allowed training on a relatively modest dataset of approximately 4,000 CT images. This transfer learning approach could be replicated for other abdominal malignancies where contrast-enhanced CT is the primary diagnostic modality.

The registered clinical trial (ChiCTR1800017542) suggests ongoing institutional commitment to advancing this research. If future iterations include 3D image recognition (the current system only processes 2D horizontal slices), staging prediction, and integration with clinical biomarkers like CA19-9, the platform could evolve into a comprehensive diagnostic decision support system for pancreatic cancer.

TL;DR: Next steps include multicenter prospective validation, adding benign and healthy controls to training data, potential 3D image recognition, and integration with biomarkers like CA19-9. The transfer learning approach (VGG16 on ImageNet) could be applied to other abdominal cancers.
Citation: Liu SL, Li S, Guo YT, et al.. Open Access, 2019. Available at: PMC6940082. DOI: 10.1097/cm9.0000000000000544. License: cc by-nc-nd.