Application of artificial intelligence in the diagnosis and treatment of hepatocellular carcinoma

PMC 2020 AI 8 Explanations View Original
Original Paper (PDF)

Unable to display PDF. Download it here or view on PMC.

Plain-English Explanations
Pages 1-2
Why AI Matters for Hepatocellular Carcinoma

This 2020 review from the World Journal of Gastroenterology examines how artificial intelligence, specifically machine learning (ML) and deep learning (DL), is being applied across the full clinical pathway of hepatocellular carcinoma (HCC). HCC is the most common primary liver cancer, and the American Cancer Society estimated 42,810 new cases of liver and intrahepatic bile duct cancer in the United States for 2020, with 30,160 deaths. The sheer volume of imaging, clinical, and histological data generated across HCC diagnosis and treatment creates a natural use case for AI-driven analysis.

Key AI concepts: The authors distinguish between ML and DL. ML is a branch of AI that learns from a pre-labeled dataset, building algorithms to recognize patterns and produce predictive models. Techniques such as support vector machines (SVM), artificial neural networks (ANNs), and classification and regression trees all fall under this umbrella. DL is a more advanced subset of ML that uses multi-layered neural network architectures, most notably convolutional neural networks (CNNs), which have proven especially effective for analyzing radiological images.

HCC has a unique clinical characteristic that makes it especially amenable to AI: it is one of the few solid tumors that can be diagnosed radiologically, without histological confirmation, when imaging findings show hyperenhancement at the arterial phase and washout at portal or late phases in a cirrhotic patient. This means that image analysis, precisely the domain where CNNs excel, sits at the center of HCC diagnosis. The review covers AI applications in ultrasound, CT, MRI, PET, histopathology, treatment response prediction, and survival estimation.

The authors note that most existing studies are retrospective and suffer from database bias, and that prospective, multicenter validation is needed before AI tools can be integrated into routine clinical practice. They also raise cost-effectiveness, regulatory approval, and ethical considerations as barriers to real-world deployment.

TL;DR: This review covers AI (ML and DL) across HCC diagnosis, treatment, and prognosis. HCC affects an estimated 42,810 new cases per year in the US (2020). Because HCC can be diagnosed radiologically without biopsy, AI-driven image analysis is particularly relevant. Most studies remain retrospective and require prospective validation.
Pages 2-3
AI-Enhanced Ultrasound for Liver Disease Classification and Lesion Detection

Abdominal ultrasound is the frontline screening tool for HCC, recommended by clinical practice guidelines for regular surveillance of patients with hepatic cirrhosis. However, ultrasound interpretation is operator-dependent and subject to significant interobserver variability. Several groups have applied AI to improve diagnostic yield from ultrasound imaging.

Liver disease staging: Bharti et al. proposed an ANN model to classify four stages of liver disease from ultrasound images: normal liver, chronic liver disease, cirrhosis, and HCC. The model achieved a classification accuracy of 96.6%. Liu et al. took a different approach, designing an algorithm focused on the morphology of the liver capsule to detect cirrhosis, even in early stages before conventional findings (nodular liver outline, enlarged porta, splenomegaly) become obvious. Their model achieved an area under the curve (AUC) of 0.968 for cirrhosis detection.

Lesion characterization: Schmauch et al. designed a DL system to detect and classify space-occupying liver lesions as benign or malignant. After supervised training on a database of 367 images paired with radiological reports, the algorithm detected lesions with a mean receiver operating characteristic (ROC) of 0.93 and characterized them with an ROC of 0.916. This system, if validated, could substantially augment the diagnostic capability of standard B-mode ultrasound.

Contrast-enhanced ultrasound (C-US): Guo et al. demonstrated that DL applied to liver lesion behavior observed during three C-US phases (arterial, portal, and late) improved the accuracy, sensitivity, and specificity of lesion characterization beyond conventional visual assessment. This multi-phase analysis approach mirrors how radiologists evaluate dynamic imaging, but with the potential for more consistent and quantitative results.

TL;DR: AI models for ultrasound achieved 96.6% accuracy in staging liver disease (Bharti et al.), AUC of 0.968 for early cirrhosis detection via liver capsule morphology (Liu et al.), and ROC of 0.93/0.916 for lesion detection/characterization (Schmauch et al., 367 images). DL also improved multi-phase contrast-enhanced ultrasound analysis.
Pages 3-4
AI for Contrast-Enhanced CT and MRI Liver Imaging

When ultrasound identifies a suspicious liver lesion, dynamic contrast-enhanced CT or MRI is the next step for precise characterization. Liver nodules that show classic HCC features (arterial hyperenhancement, portal/late-phase washout) in a cirrhotic patient can be diagnosed without biopsy. But many nodules exhibit indeterminate behavior, requiring either biopsy or close follow-up. AI aims to reduce this diagnostic ambiguity.

Indeterminate nodules on CT: Mokrane et al. retrospectively analyzed 178 cirrhotic patients with liver nodules that the Liver Reporting and Data System (LI-RADS) criteria could not definitively classify, necessitating biopsy. Of those biopsied, 77% proved malignant. Using DL to classify nodules as HCC or non-HCC achieved an AUC of 0.70. While modest, this suggests a role for AI in triaging indeterminate lesions. Yasaka et al. trained an ANN on over 55,000 image sets to classify liver masses on contrast-enhanced CT into five categories: classic HCC, other malignancies (cholangiocarcinoma, hepatocholangiocarcinoma, metastasis), indeterminate/dysplastic nodules, hemangiomas, and cysts. The system achieved high accuracy, particularly for distinguishing malignant from benign lesions.

Tumor recurrence and segmentation: Vivanti et al. described an automated detection method for tumor recurrence on follow-up CT, based on initial tumor appearance, CT behavior, and tumor load quantification, achieving an accuracy of 86% for identifying true recurrences. Li et al. proposed a CNN for liver tumor segmentation on CT images, achieving 82.67% +/- 1.43% accuracy, outperforming traditional segmentation techniques and supporting more precise treatment planning.

MRI applications: Hamm et al. developed and validated a DL system based on CNN for classifying MRI liver lesions, reporting 92% accuracy, 92% sensitivity, 98% specificity, and an average computation time of just 5.6 milliseconds. Jansen et al. built an automated classification system incorporating MRI sequences and patient risk factors, cataloguing lesions as adenoma, cyst, hemangioma, HCC, or metastasis. Their sensitivity/specificity values were: adenoma 0.80/0.78, cyst 0.93/0.93, hemangioma 0.84/0.82, HCC 0.73/0.56, and metastasis 0.62/0.77. Zhang et al. also reported promising results training a CNN on MRI in 20 patients for liver tissue classification.

TL;DR: For CT, DL classified indeterminate nodules at AUC 0.70 (178 patients), an ANN trained on 55,000+ image sets categorized liver masses into 5 types, and automated recurrence detection reached 86% accuracy. For MRI, a CNN classifier achieved 92% accuracy, 92% sensitivity, and 98% specificity with 5.6 ms computation time.
Pages 4-5
AI Applications in PET Imaging and Histological Classification

PET imaging: Preis et al. evaluated the yield of 18F-FDG PET/CT (fluorine-18 fluorodeoxyglucose positron emission tomography/computed tomography) using a neural network to analyze liver uptake of 18F combined with patient demographics and laboratory data. The model achieved high sensitivity and specificity for detecting liver malignancy that was not identified visually by radiologists. While this study primarily targeted metastatic liver disease, where 18F-FDG PET/CT has greater clinical utility than for primary HCC, it demonstrated that AI could serve as a complementary tool for radiologists interpreting PET scans.

Histopathological classification: The histological differentiation of liver tumors is critical for treatment planning and prognosis, but can be challenging even for expert pathologists. Kiani et al. prospectively evaluated whether a DL assistant improved pathologists' ability to distinguish HCC from cholangiocarcinoma. The study assessed 11 pathologists and found that the AI tool did not change their mean diagnostic accuracy. This is a notable negative result, suggesting that simply overlaying AI predictions onto existing pathologist workflows does not automatically improve performance.

By contrast, Liao et al. demonstrated that a deep CNN trained on histopathological images could perform automated diagnosis of HCC, distinguishing healthy tissue from tumor tissue and identifying certain biological predictors from the images. This approach focuses on full automation rather than decision support, which may represent a more effective paradigm for AI in digital pathology. The divergent outcomes between the Kiani and Liao studies highlight an important distinction: AI as a standalone diagnostic engine versus AI as an overlay on human judgment, with each approach suited to different clinical scenarios.

TL;DR: A neural network on 18F-FDG PET/CT detected liver malignancies missed visually by radiologists. In histopathology, a DL assistant for 11 pathologists showed no improvement in HCC vs. cholangiocarcinoma differentiation (Kiani et al.), while a fully automated deep CNN successfully distinguished tumor from healthy tissue (Liao et al.).
Pages 5-7
Predicting Recurrence and Survival After Surgical Resection

Early tumor recurrence after surgical resection of HCC is associated with poor prognosis, making preoperative risk stratification essential. AI-based models have been developed to predict two key outcomes: the presence of vascular microinvasion (VMI) before surgery, and post-resection survival. VMI is an independent predictive factor for recurrence, but standard radiological techniques cannot directly diagnose it preoperatively.

VMI prediction with radiomics: Multiple groups have built radiomic signatures to predict VMI status. Xu et al. achieved an AUC of 0.90 for VMI prediction using contrast-enhanced CT radiomics in 495 patients. Ma et al. reported an AUC of 0.73 (157 patients) using a similar CT-based approach. Zhou et al. analyzed contrast-enhanced MRI in 46 patients, achieving an AUC of 0.918, sensitivity of 92%, and specificity of 66%. Dong et al. took a different route, using grayscale ultrasound-based radiomic algorithms to predict VMI in 322 patients, achieving an AUC of 0.73 with a sensitivity of 91.9%. This ultrasound-based approach is notable because it avoids radiological exposure and is less costly than CT or MRI-based methods.

Recurrence prediction: Ji et al. created predictive models for recurrence after surgical resection using radiomic analysis of contrast-enhanced CT images from 470 patients across multiple institutions, achieving a C-index of 0.633 to 0.699. When clinical data was incorporated alongside imaging features, the model supported personalized risk stratification for individual HCC management.

Survival prediction: Saillard et al. drew up a predictive model of survival after resection using DL on digitalized histological slides from 194 patients, attaining a C-index of 0.78. Schoenberg et al. conducted a prospective study of 180 patients and built a predictive model analyzing 26 preoperative routine clinical variables, also obtaining a predictive value of 0.78. These two studies converge on the same performance level through very different data inputs (histology vs. clinical variables), suggesting that the C-index of approximately 0.78 may represent a practical ceiling for current approaches.

TL;DR: VMI prediction reached AUC 0.90 (CT radiomics, 495 patients) and AUC 0.918 (MRI, 46 patients). Post-resection recurrence models achieved C-index 0.633-0.699 (470 patients). Survival prediction hit C-index 0.78 in two independent studies using DL on histology (194 patients) and clinical variables (180 patients, prospective).
Pages 7-8
AI for Predicting Response to Chemoembolization and Radiofrequency Ablation

Transcatheter arterial chemoembolization (TACE) is the standard treatment for intermediate-stage (BCLC stage B) HCC. Selecting patients who will actually benefit from TACE is critical for avoiding unnecessary procedures and their associated side effects. AI models have been developed to predict TACE response using various imaging modalities and, in some cases, genomic data.

CT-based prediction: Morshid et al. built a fully automated ML algorithm combining quantitative CT image features with pretreatment clinical data, achieving a prediction accuracy of 74.2% when BCLC stage and image features were used together, compared to lower accuracy from BCLC staging alone. Peng et al. validated a residual CNN to predict TACE response using CT images from 789 patients across three hospitals, achieving an accuracy of 84.3% and an AUC of 0.97 for predicting complete response. This is one of the largest and strongest results in the field.

Contrast-enhanced ultrasound and MRI: Liu et al. constructed a DL radiomics-based model using quantitative analysis of C-US cine recordings from 130 patients, achieving an AUC of 0.93 (95% CI: 0.80-0.98) for predicting TACE response. Abajian et al. studied 36 patients who underwent MRI before TACE, developing a predictive model with 78% accuracy, 62.5% sensitivity, and 82% specificity. Additionally, Mahringer-Kunz et al. built an ANN using the parameters from three conventional prediction scores (ART, ABCR, and SNACOR) to predict one-year survival after TACE in 282 patients, achieving an AUC of 0.77, 78% sensitivity, and 81% specificity, outperforming the individual conventional scores.

Genomic approaches and RFA: Ziv et al. explored genetic mutation analysis using SVM to predict tumor response after TACE, though this was a small retrospective study of only 17 patients with a prediction accuracy of 70%. For radiofrequency ablation (RFA), Liang et al. built a predictive model of HCC recurrence based on SVM in 83 patients, achieving an AUC of 0.69, sensitivity of 67%, and specificity of 86%. Notably, this was one of the few prospective studies in the entire review.

TL;DR: TACE response prediction reached AUC 0.97 and 84.3% accuracy (CNN, 789 patients across 3 hospitals), and AUC 0.93 using C-US radiomics (130 patients). An ANN outperformed conventional scores for one-year survival after TACE (AUC 0.77, 282 patients). For RFA, an SVM-based model in a prospective study of 83 patients achieved AUC 0.69 for recurrence prediction.
Pages 8-9
Predicting Overall Survival Through Epigenetic Analysis

Beyond treatment-specific outcomes, AI has been applied to predict overall survival in HCC patients independent of any particular therapy. Dong et al. leveraged emerging evidence on the relationship between abnormalities in DNA methylation and HCC to build a survival prediction model. Using SVM to analyze DNA methylation data from 377 HCC samples, they constructed three risk categories to predict overall survival and achieved a mean 10-fold cross-validation score of 0.95.

This result is notable for several reasons. First, the performance metric (0.95 cross-validation score) is exceptionally high compared to other survival prediction models in the review, which typically achieved C-indices in the 0.70-0.78 range. Second, the approach moves beyond imaging-based features into molecular and epigenetic data, suggesting that genomic-level information may carry stronger prognostic signals for HCC than radiological or clinical variables alone.

However, the cross-validation score should be interpreted cautiously. A 10-fold cross-validation within a single dataset does not provide the same level of confidence as external validation on an independent cohort. Over-fitting is a real concern, particularly with high-dimensional methylation data where the number of features can vastly exceed the number of samples. No external validation was reported, so this result requires independent confirmation before it can be considered clinically actionable.

TL;DR: SVM analysis of DNA methylation data from 377 HCC samples produced three risk categories for overall survival prediction, with a 10-fold cross-validation score of 0.95. This outperformed imaging-based models (C-index 0.70-0.78), but lacks external validation and carries over-fitting risk given high-dimensional methylation data.
Pages 9-11
Critical Gaps, Study Design Limitations, and the Path Forward

Retrospective bias: Nearly all studies reviewed were retrospective in design. Only one study (Liang et al., RFA recurrence prediction, 83 patients) was prospective. Retrospective studies carry inherent selection bias, and the databases used may not represent the diversity of real-world patient populations. The authors specifically call out the risk that biased datasets can affect the accuracy and interpretability of AI models, limiting their acceptance in clinical practice.

Small sample sizes and single-center design: Many studies relied on small cohorts, sometimes as few as 17 to 46 patients. Single-center data compounds the generalizability problem, as imaging protocols, patient demographics, and disease prevalence can differ substantially between institutions. The strongest study in the review (Peng et al., TACE prediction with CNN) used 789 patients across three hospitals, which illustrates the scale and multi-center design needed for credible AI model development.

Unresolved clinical questions: The review identifies several areas where AI could have high impact but remains understudied. These include the characterization of indeterminate hepatic lesions (where the best DL model achieved only AUC 0.70), the differential diagnosis between HCC and cholangiocarcinoma (where AI assistance did not improve pathologist accuracy), and the analysis of HCC behavior in cirrhotic versus non-cirrhotic patients. The differentiation of primary liver tumors from metastatic lesions, and prediction of response to percutaneous therapies, also represent open challenges.

Toward clinical integration: The authors emphasize that larger comparative studies are needed, specifically trials that measure the performance of medical professionals with AI support against professionals without it. Beyond accuracy, cost-effectiveness analysis, regulatory pathway development, and ethical frameworks for AI-assisted decision-making must be addressed. Health-care professionals also need formal training to understand both the strengths and limitations of AI before it is incorporated into daily liver cancer management.

The conclusion is balanced: AI represents one of the most relevant advances in medicine, with clear utility for processing and analyzing the enormous volume of HCC-related data. But AI is here to support human intelligence, not replace it, and medical protocols must remain rigorously transparent. The gap between promising retrospective results and validated clinical tools remains substantial.

TL;DR: Nearly all studies were retrospective, with sample sizes as small as 17-46 patients. Only 1 of the reviewed studies was prospective. Key unresolved areas include indeterminate lesion classification (best AUC: 0.70), HCC vs. cholangiocarcinoma differentiation, and cirrhotic vs. non-cirrhotic HCC behavior. Prospective, multicenter validation and clinician training are essential next steps.
Citation: Jiménez Pérez M, Grande RG.. Open Access, 2020. Available at: PMC7545389. DOI: 10.3748/wjg.v26.i37.5617. License: cc by-nc.