AI Clinical Applications in HCC Management

Plain-English Explanations

Overview

Pages 1-3

Why This Review Matters: HCC as a Growing Global Threat

Hepatocellular carcinoma (HCC) is the most common primary liver cancer, accounting for approximately 80% of all hepatic malignancies. It ranks as the sixth most common cancer globally by incidence and the third leading cause of cancer-related death. Between 1% and 8% of patients with liver cirrhosis develop HCC annually, and the disease is fueled by a complex web of risk factors that vary by geographic region: hepatitis B virus (HBV) dominates in most of Asia, South America, and Africa; hepatitis C virus (HCV) is the leading cause in Western Europe, North America, and Japan; and alcohol intake is the primary driver in Central and Eastern Europe.

A major shift in HCC epidemiology is now underway. Metabolic dysfunction-associated steatotic liver disease (MASLD), formerly known as NAFLD, is rapidly becoming the most common cause of chronic liver damage, cirrhosis, and HCC in Western countries. This is alarming because obesity and diabetes are independent risk factors for HCC, and MASLD-related HCC can occur even in patients without advanced fibrosis or cirrhosis, meaning these patients escape traditional surveillance programs entirely. With the long-term success of HBV vaccination and effective antiviral therapies for HCV, this metabolic etiology is projected to become even more dominant.

Despite therapeutic advances including surgical resection, liver transplantation, locoregional therapies such as transarterial chemoembolization (TACE), and systemic agents like sorafenib and immunotherapy, the prognosis for advanced-stage HCC remains poor. The low survival rate, particularly in advanced disease, combined with high recurrence rates, has created a plateau in management progress. This review by Romeo et al. from the University of Campania Luigi Vanvitelli examines whether artificial intelligence (AI), through machine learning (ML) and deep learning (DL) approaches, can break through this plateau by improving screening, diagnosis, treatment prediction, and biomarker discovery across all phases of HCC management.

TL;DR: HCC accounts for 80% of liver cancers and ranks third in cancer mortality. MASLD is emerging as a dominant cause, producing HCC even without cirrhosis. This review evaluates AI/ML/DL applications across all HCC management phases, from risk stratification to treatment response prediction, to address the current plateau in outcomes for advanced-stage disease.

Risk Stratification

Pages 4-6

AI-Powered HCC Risk Stratification Across Liver Disease Etiologies

Current HCC surveillance guidelines from the European Association for the Study of the Liver (EASL) recommend liver ultrasound every six months for high-risk patients. However, fewer than half of eligible European patients actually receive this surveillance, and the standard approach fails to detect early HCC in a considerable portion of patients. The review argues that risk-based surveillance, tailored to individual HCC risk, is superior to the one-size-fits-all approach, and that AI is uniquely suited to power this personalization.

For HCV-related cirrhosis: Ioannou et al. studied 48,151 patients with HCV-related cirrhosis from the Veterans Health Administration. Deep learning recurrent neural network (RNN) models, trained on raw longitudinal electronic health record (EHR) data, outperformed conventional linear regression models for identifying patients at high risk of developing HCC. Separately, Minami et al. developed the "SMART" model using a random survival forest (RSF) algorithm on 1,742 HCV patients who had achieved sustained virological response (SVR), incorporating seven clinical parameters: age, platelet count, alpha-fetoprotein (AFP), gamma-glutamyl transferase (GGT), BMI, albumin, and AST. This model demonstrated good predictive ability for post-SVR HCC risk.

For chronic HBV infection: Kim HY, Lampertico P, et al. developed the PLAN-B model using a gradient-boosting machine (GBM) algorithm on a derivation cohort of 6,051 patients receiving antiviral therapy (entecavir or tenofovir), with two external validation cohorts. The model incorporated ten baseline parameters including cirrhosis status, age, platelet count, antiviral type, sex, ALT, HBV-DNA, albumin, bilirubin, and HBeAg status, and showed significant superiority over previous models in both validation cohorts.

For MASLD: Sarkar et al. evaluated five different ML algorithms for predicting HCC in MASLD patients: decision tree (DT), gradient boosted (GB), naive Bayes (NB), probabilistic neural network (PNN), and random forest (RF). The GB-ML model achieved 92.06% accuracy, AUC of 0.97, F1 score of 0.84, 98.34% specificity, and 74.41% sensitivity using EMR data. The strongest clinical predictors were FIB-4 score, alkaline phosphatase (ALP), total cholesterol, bilirubin, and hypertension.

TL;DR: AI models outperform traditional approaches across all major HCC etiologies. RNNs on 48,151 HCV-cirrhosis patients beat linear regression; the PLAN-B GBM model on 6,051 HBV patients surpassed prior models; and a gradient-boosted ML model for MASLD achieved 92.06% accuracy with AUC 0.97 for predicting HCC development, with FIB-4 as the strongest predictor.

Ultrasound Diagnosis

Pages 7-8

AI-Enhanced Ultrasound and CEUS for HCC Detection

Ultrasound is the first-line imaging modality for identifying liver lesions in clinical practice. AI has the potential to assist less experienced radiologists by improving their diagnostic performance and reducing dependence on more expensive cross-sectional imaging. The review highlights several key models that have demonstrated strong results in this domain.

DCNN-US model: Yang et al. developed and externally validated a deep convolutional neural network (DCNN) using a large, multicenter database of US images from 13 hospital systems. The resulting DCNN-US model achieved an AUROC of 0.92 for differentiating benign from malignant liver lesions. Its diagnostic sensitivity and specificity were superior to those of 15-year skilled radiologists, and its accuracy (76.0%) was comparable to that of clinical radiologists, approaching contrast-enhanced CT accuracy (84.7%) and only slightly below MRI (87.9%).

YOLOv5 for focal liver lesions: A retrospective study from Thailand incorporated 26,288 US images from 5,444 patients to train the YOLOv5 deep learning model for detection and classification of seven different types of focal liver lesions (FLLs), including HCC and regenerative nodules. The model achieved an overall FLL detection rate of 84.8% (95% CI: 83.3-86.4), with HCC detection at 82.3% (95% CI: 77.1-87.5). Notably, specificities and negative predictive values for regenerative nodules reached 100% and 99.9%, respectively.

DCCA-MKL for contrast-enhanced ultrasound: Because contrast-enhanced ultrasound (CEUS) shows superior diagnostic performance compared with standard B-mode ultrasound, Guo LH et al. proposed the DCCA-MKL framework, a two-stage multiple-view learning system combining deep canonical correlation analysis and multiple kernel learning. Using only three typical CEUS images from the arterial, portal venous, and late phases, this model effectively discriminated between benign and malignant liver lesions, demonstrating the feasibility of AI-assisted CEUS-based diagnosis.

TL;DR: The DCNN-US model on 13 hospital systems achieved AUROC 0.92, matching or exceeding 15-year experienced radiologists. YOLOv5 trained on 26,288 images from 5,444 patients detected HCC at 82.3% with 100% specificity for regenerative nodules. The DCCA-MKL framework demonstrated effective CEUS-based discrimination using just three phase-specific images.

CT and MRI Diagnosis

Pages 8-9

AI Models for CT and MRI-Based HCC Diagnosis

CT is routinely used for accurate HCC diagnosis, especially when CEUS results are inconclusive, while MRI provides the highest diagnostic accuracy. AI-based radiomics and deep learning are enhancing both modalities by extracting quantitative features invisible to the human eye and automating classification of indeterminate liver lesions.

CT radiomics signature: Mokrane et al. developed a radiomics signature based on 13,920 CT imaging features extracted from 178 cirrhotic patients. Using ML techniques for training, calibration, and validation, this signature achieved an AUC of 0.740 (95% CI: 0.610-0.801) for distinguishing HCC from non-HCC lesions by quantifying changes between arterial and portal venous phases. While the AUC is moderate, this approach demonstrates the potential of high-dimensional radiomics in challenging diagnostic scenarios involving indeterminate nodules in cirrhotic livers.

Spatio-Temporal 3D Convolution Network (ST3DCN): Ho Yu et al. developed four DL models for diagnosing HCC on CT, identifying the ST3DCN as the best performer with an AUC of 0.919 (95% CI: 0.903-0.935) and NPV of 0.966 (95% CI: 0.954-0.979). This model specifically targets the considerable proportion of indeterminate observations encountered in routine practice, outperforming standard-of-care radiological interpretation.

CNN-based MRI classification: Hamm et al. created a convolutional neural network (CNN)-based deep learning system achieving 92% accuracy, 92% sensitivity, and 98% specificity for classifying liver lesions on MRI. Building on this, Zhen et al. trained CNNs on data from 1,210 patients with liver tumors and validated on an external cohort of 201 patients. By combining unenhanced MRI images with clinical data, the model achieved a remarkable AUC of 0.985 (95% CI: 0.960-1.000) for classifying HCC, with sensitivity and specificity comparable to three experienced radiologists.

TL;DR: CT-based AI ranges from radiomics signatures (AUC 0.740 on 178 patients) to the ST3DCN model (AUC 0.919, NPV 0.966). On MRI, CNN-based systems achieve 92% accuracy/sensitivity/98% specificity, and combining unenhanced MRI with clinical data reached AUC 0.985 on 1,210 training patients with external validation on 201 patients, matching experienced radiologists.

Histopathology

Pages 9-11

AI Applications in Histopathology and Somatic Mutation Prediction

When imaging is inconclusive, histopathology is essential for confirming HCC diagnosis, assessing cellular differentiation, vascular invasion, and metastatic potential. AI is being applied to digitized tissue slides to automate classification, reduce pathologist bias, and extract molecular-level information directly from standard staining.

VGG-16 and Inception V3 architectures: Lin et al. fused multiphoton microscopy with a DL algorithm using a pre-trained VGG-16 framework, achieving greater than 90% accuracy for classifying HCC differentiation grades. Chen et al. then trained the Inception V3 CNN on hematoxylin- and eosin-stained slides from the Genomic Data Commons, reporting 96.0% accuracy for benign vs. malignant classification and 89.6% accuracy for tumor differentiation grading (good, moderate, poor), approaching the performance of a 5-year-experienced pathologist. Crucially, this model was further trained to predict the ten most common mutated genes in HCC, successfully identifying four (CTNNB1, FMN2, TP53, and ZFX4) as predictable from histopathology images, with AUCs ranging from 0.71 to 0.89.

DCNN superiority and diagnostic assistance: Liao et al. demonstrated that a DCNN model surpassed Inception V3 for automatic HCC diagnosis from histological slides while simultaneously predicting somatic mutations using features from the TCGA database. In a separate study, Kiani et al. evaluated AI-assisted diagnosis with 11 pathologists of varying expertise on hematoxylin- and eosin-stained whole-slide images (WSI), distinguishing HCC from cholangiocarcinoma. The model achieved 0.885 accuracy on a validation set of 26 WSI and 0.842 on an independent test set of 80 WSI. Correct AI predictions improved pathologist accuracy (OR = 4.289), but incorrect predictions also reduced it (OR = 0.253), regardless of pathologist experience level.

These findings reveal both the promise and the risk of AI-assisted pathology. While AI can substantially improve diagnostic accuracy and uncover hidden molecular features from routine staining, clinicians and pathologists must remain vigilant about the potential for incorrect AI predictions to mislead decision-making, a phenomenon sometimes called "automation bias."

TL;DR: VGG-16 exceeds 90% accuracy for HCC grading; Inception V3 reaches 96.0% for malignancy classification and predicts somatic mutations (CTNNB1, TP53, FMN2, ZFX4) with AUCs 0.71-0.89. AI-assisted pathology with 11 pathologists showed correct predictions boosted accuracy (OR 4.289) but incorrect ones reduced it (OR 0.253), highlighting automation bias risk.

Liquid Biopsy

Pages 10-11

AI-Driven Transcriptomics and Fusion Gene Blood Tests for HCC

Beyond imaging and tissue-based pathology, molecular biology represents a frontier where AI can transform HCC management. Blood-based tests using fusion gene transcripts could enable non-invasive screening, diagnosis, and treatment monitoring without the need for liver biopsy.

Fusion gene discovery: Yu et al. identified a panel of fusion genes created during DNA rearrangements in hepatocyte neoplastic transformation. Among eight fusion genes, MAN2A1-FER, TRMT11-GRIK2, and CCNH-C5orf30 were the most frequent in HCC samples, and their transcripts were detectable as circulating cell-free RNA in serum. The distributions of these RNA fragments closely matched those found in primary HCC tumor tissue, confirming their potential as liquid biopsy biomarkers.

ML-based blood classification: Researchers at the University of Pittsburgh analyzed fusion transcripts in 61 HCC patients and 75 patients with non-cancer liver diseases. An ML algorithm classified the data and determined that four fusion transcripts (MAN2A1-FER, CCNH-C5orf30, SLC45A2-AMACR, and PTEN-NOLC1) at specific thresholds were associated with high cancer probability, correctly predicting disease in approximately 83% of cases. A refined second system, combining two fusion transcripts with serum AFP levels, achieved predictive capacity approaching 95%.

From a translational perspective, these tests could serve as screening tools for early HCC diagnosis in high-risk patients, help determine whether patients with liver nodules of unknown nature should undergo biopsy, and evaluate treatment effectiveness by monitoring post-treatment fusion transcript levels. While further validation studies are needed, the simplicity and non-invasive nature of these approaches make them compelling candidates for clinical adoption.

TL;DR: ML-based analysis of fusion gene transcripts in blood correctly predicted HCC in 83% of cases using four fusion transcripts in 61 HCC vs. 75 non-cancer patients. Adding AFP to two fusion transcripts pushed predictive capacity to 95%. These non-invasive blood tests could serve as screening, biopsy guidance, and treatment monitoring tools.

Treatment Prediction

Pages 11-13

AI Models for Predicting TACE Response and Treatment Outcomes

The Barcelona Clinic Liver Cancer (BCLC) staging system guides HCC treatment across five stages (0, A, B, C, D). Early-stage patients (0/A) can undergo potentially curative surgical resection or liver transplantation, with reported 5-year survival rates of 60-80%. However, patients in stages B and C face dramatically worse prognoses and are typically treated with TACE (locoregional) or systemic therapies. AI models are being developed to predict which patients will respond to these treatments, avoiding unnecessary adverse effects.

ResNet50 for TACE response prediction: In a multicenter study, Peng et al. used 789 CT images from three hospitals to build a predictive model using transfer learning with a residual convolutional neural network (ResNet50). The model predicted TACE response in BCLC-B patients with extraordinary accuracy: AUCs of 0.97 for complete response, 0.96 for partial response, 0.95 for stable disease, and 0.96 for progressive disease, validated across two independent cohorts.

Atezolizumab-bevacizumab response signature: Zeng Q et al. developed an AI model that estimates the atezolizumab-bevacizumab response signature (ABRS) expression directly from histological slides. The model predictions were associated with progression-free survival (PFS), proposing a novel, inexpensive biomarker for immunotherapy-targeted treatment. By combining AI heatmaps with spatial transcriptomics, this approach provides insight into the biological mechanisms driving treatment responses.

These models represent a shift from empirical treatment selection to data-driven prediction. Rather than applying standard protocols uniformly, AI enables clinicians to match individual patients with the therapies most likely to succeed, sparing non-responders from ineffective treatments and their associated toxicities.

TL;DR: A ResNet50 model trained on 789 CT images from 3 hospitals predicted TACE response in BCLC-B patients with AUCs of 0.95-0.97 across all response categories. An AI model estimating the atezolizumab-bevacizumab response signature from histological slides serves as a biomarker for immunotherapy progression-free survival.

Recurrence Prediction

Pages 12-14

AI for Predicting HCC Recurrence After Surgery and Transplantation

HCC recurrence after treatment is a major clinical challenge. "Early" recurrence (within 2 years) is caused by occult intrahepatic spread and carries a significantly worse prognosis than "late" recurrence (after 2 years), which represents de novo HCC. AI models are being developed to stratify patients by recurrence risk, guiding surveillance intensity and adjuvant therapy decisions.

Radiomics with ML: Ji GW et al. studied 470 patients who underwent curative resection for solitary HCC across 3 departments. Using an ML framework on contrast-enhanced CT, they identified a three-feature radiomics signature achieving a C-index of 0.733-0.801 for recurrence prediction, outperforming rival models and widely used staging systems. For post-transplant recurrence, Nam JY et al. developed MoRAL-AI using a deep neural network (DNN) on 563 patients from three Korean transplant centers, with age, tumor size, and serum AFP as the most heavily weighted factors. MoRAL-AI showed significantly better discrimination than the Milan criteria in external validation.

DL on histological slides: Saillard et al. developed two DL algorithms: SCHMOWDER (using attention mechanisms on pathologist-annotated tumor areas) and CHOWDER (working without human expertise). They achieved c-indices of 0.78 and 0.75, respectively, for survival prediction, outperforming composite clinical scores and identifying vascular spaces, macro-trabecular patterns, and lack of immune infiltration as predictors of poor survival. Yamashita et al. built HCCSurvNet, achieving concordance indices of 0.724 and 0.683 on internal and external test cohorts for recurrence-free survival, exceeding the TNM classification system.

SVM and MobileNetV2 models: Saito et al. applied a support vector machine (SVM) to digital pathologic images of 158 HCC patients meeting Milan criteria, distinguishing three recurrence groups (within 1 year, 1-2 years, and no recurrence within 4 years) with 89.9% accuracy. A separate study trained a MobileNetV2-based classifier on 1,118 HCC patients from four independent cohorts, using a U-Net to capture nuclear architecture. The MobileNetV2_HCC_class model was a strong predictor of recurrence-free survival after both liver transplantation and resection, identifying cytological atypia and nuclear hyperchromasia as key histologic features in high-recurrence areas.

TL;DR: Radiomics ML on 470 patients achieved C-index 0.733-0.801 for recurrence prediction; MoRAL-AI DNN on 563 transplant patients outperformed Milan criteria. DL algorithms SCHMOWDER/CHOWDER reached c-indices 0.75-0.78 for survival. SVM classified recurrence timing at 89.9% accuracy in 158 patients, and MobileNetV2 on 1,118 patients predicted recurrence-free survival across four cohorts.

Challenges and Future

Pages 14-17

Limitations, Explainable AI (XAI), and the Road to Clinical Implementation

Despite promising results, the review concludes that AI is not yet ready to fully update diagnostic and predictive models in routine HCC management. Most studies are retrospective with relatively small sample sizes, lacking the prospective validation needed for clinical adoption. The training populations lack ethnic and socioeconomic diversity, and the heterogeneity of algorithms and software makes external validation both crucial and difficult. The dependency on training dataset size and diversity means AI-based risk scores cannot yet serve as standalone tools for HCC prediction.

Explainable AI (XAI) is emerging as a critical frontier. Lacalamita et al. developed a supervised learning framework using hierarchical community detection to identify 20 gene communities distinguishing healthy from cancerous samples with high accuracy, then applied XAI to assess individual gene contributions. Another study used robust AI models validated on extensive gene expression datasets, identifying biomarkers TOP3B, SSBP3, and COX7A2L as clinically relevant HCC prognostic markers through explainable metrics. A third study combined automated ML (AutoML) with XAI, using the TPOT tool and TreeSHAP analysis to differentiate HCC from liver cirrhosis in HCV patients by identifying key metabolites including L-valine, glycine, and DL-isoleucine.

The review highlights that trustworthy AI must address seven dimensions codified in the Assessment List for Trustworthy Artificial Intelligence (ALTAI): human agency and oversight; technical robustness and safety; privacy and data governance; transparency; diversity, non-discrimination, and fairness; environmental and societal well-being; and accountability. Transfer learning (TL) is identified as a promising technique for reusing pre-trained models across tasks, potentially reducing the need for massive labeled datasets.

The authors emphasize that rather than replacing clinicians, AI requires highly trained personnel to properly input data and interpret outputs. The future demands prospective clinical trials integrating AI algorithms, shared global databases with imaging, clinical variables, and biological samples from diverse geographic areas, and collaborative-learning strategies. The integration of XAI with traditional AI is essential for building the transparent, interpretable models that the multidisciplinary HCC management team (oncologists, gastroenterologists, surgeons, and radiologists) can trust and effectively deploy in clinical practice.

TL;DR: AI is not yet ready for routine HCC clinical use due to retrospective study designs, limited diversity, and lack of prospective validation. XAI approaches (TreeSHAP, ALTAI framework) are advancing interpretability, identifying biomarkers like TOP3B and metabolites such as L-valine. The path forward requires global database sharing, prospective trials, transfer learning, and integration of XAI to build clinician trust.