AI Prognostic Score in HCC - EliminateCancer.ai

Plain-English Explanations

Overview & Background

Pages 1-2

Why Predicting HCC Survival Remains a Major Clinical Challenge

Primary liver cancer was the sixth most common cancer and the third leading cause of cancer-related deaths worldwide in 2020, with roughly 906,000 new cases and 830,000 deaths. Hepatocellular carcinoma (HCC) accounts for 75%-85% of all primary liver cancers, and its 5-year net survival sits in the 10%-19% range in most regions globally. Deaths from liver cancer are projected to reach approximately 1,679,630 by 2040, an 85.4% increase over 2020 figures. Only about 10% of newly diagnosed HCC patients are recommended for surgical resection, and even curative therapies (resection, transplantation, or ablation) yield 5-year overall survival (OS) rates of only 60%-70%. For the majority of patients who receive palliative or locoregional treatments like transarterial chemoembolization (TACE), 5-year OS drops below 30%.

Existing staging systems: Several prognostic scoring systems are currently used for HCC, including BCLC (Barcelona Clinic Liver Cancer), TNM (tumor node metastasis), Okuda grade, CLIP (Cancer of the Liver Italian Program), CUPI (Chinese University Prognostic Index), JIS (Japan Integrated Staging), and ALBI (albumin-bilirubin) grade. These systems rely primarily on tumor burden, liver function, and performance status. However, they do not account for the interaction between tumors and the host immune response, which is increasingly recognized as a key determinant of outcomes.

The immune angle: Prior research has shown that high densities of CD3 and CD8 T cells in tumor tissue can predict improved disease-free survival and overall survival in colorectal cancer and HCC patients after hepatectomy (CD3 odds ratio = 5.8, CD8 odds ratio = 3.9 for recurrence prediction). However, results on tumor-infiltrating immune cells and prognosis in HCC remain inconsistent, and most studies have focused on patients after surgery rather than the much larger population of unresectable patients. This study set out to fill that gap by incorporating circulating T cell counts into an AI-based prognostic model.

TL;DR: HCC accounts for 75%-85% of liver cancers with 5-year survival of only 10%-19% globally. Only 10% of patients qualify for curative surgery. Existing staging systems (BCLC, TNM, CLIP, ALBI, and others) ignore immune factors. This study aims to integrate T cell immunity into an AI prognostic model for all HCC patients, not just surgical candidates.

Study Design & Cohort

Pages 2-3

3,427 Patients Enrolled in a Single-Center Retrospective Cohort

The authors enrolled 3,427 patients with first-diagnosed primary liver cancer who were hospitalized at Beijing Ditan Hospital, Capital Medical University, between January 2008 and June 2017. Patients were included if they were diagnosed with primary liver cancer (with or without chronic liver diseases) and were aged 18-75 years. Exclusion criteria removed 757 patients: 213 with cholangiocarcinoma, 96 with metastatic liver cancer, 67 with other tumor types, 201 lost to follow-up, and 180 with incomplete clinical data. The remaining 2,670 patients were randomly split into a training set (n = 1,861) and a validation set (n = 809).

Patient demographics: Among the 2,670 patients, 84.2% (n = 2,249) were infected with hepatitis B virus (HBV), 9.1% (n = 242) with hepatitis C virus (HCV), and 10.6% (n = 282) had alcoholic liver disease. The mean age was 55.67 years, with 77.9% male. Among HBV-positive patients, 71.9% (n = 1,920) received antiviral therapy, 46.8% (n = 1,250) achieved HBV DNA suppression below 500 IU/mL, and 54.9% (n = 1,466) achieved HBeAg seroconversion. There were no statistically significant differences in demographic characteristics, laboratory indicators, or tumor features between the training and validation groups.

Clinical and laboratory data collected: The study recorded an extensive set of variables including gender, age, family history of HCC, smoking and alcohol history, liver cirrhosis status, comorbidities (diabetes, hypertension, hyperlipidemia, coronary artery disease), and HCC etiology. Laboratory tests included routine blood counts, liver function panels, serum alpha-fetoprotein (AFP), C-reactive protein, creatinine, prothrombin activity, and INR. Critically, peripheral blood T cell subsets (total T cells, CD4 T cells, CD8 T cells) were measured using a MULTITEST CD45-PerCP/CD3-FITC/CD4-APC/CD8-PE TruCount four-color flow cytometry kit before treatment initiation. Tumor characteristics included number, maximum size, vascular invasion, and metastasis from imaging at enrollment.

TL;DR: From 3,427 initially screened patients, 2,670 met criteria and were split into training (n = 1,861) and validation (n = 809) sets. The cohort was 84.2% HBV-positive, mean age 55.67 years, and 77.9% male. Peripheral T cell subsets were measured by flow cytometry before treatment, alongside comprehensive clinical and laboratory variables.

Methodology

Pages 3-4

Cox Regression Screening Followed by Artificial Neural Network Construction

The analytical pipeline combined traditional survival statistics with machine learning. First, Cox univariate and multivariate analyses (forward selection, maximum likelihood ratio) were used to identify independent risk factors for death. These factors were then used as inputs for an artificial neural network (ANN) model built in Python. The ANN architecture consisted of 14 clinical and biochemical input neurons and two output neurons corresponding to clinical outcomes (alive or dead). After several rounds of debugging and testing, the authors settled on a multilayer perceptron (MLP) with three hidden layers to optimize performance.

Model evaluation metrics: The ANN model was evaluated using multiple complementary approaches. Discrimination was assessed via the concordance index (C-index) and time-dependent area under the receiver operating characteristic curve (AUC) for 1-year, 3-year, and 5-year OS. Calibration was tested using the Hosmer-Lemeshow test and calibration curves comparing predicted versus observed survival probabilities. Clinical utility was assessed through decision curve analysis (DCA), which measures net clinical benefit at various threshold probabilities. All tests used a significance threshold of p < 0.05.

Comparison models: The ANN model was benchmarked against seven established staging and prognostic systems: BCLC, TNM, Okuda, CLIP, CUPI, JIS, and ALBI. The authors also compared the ANN to a traditional Cox regression model using the same input variables. All analyses were performed in R version 3.3.2, using the rms, survival, survminer, rmda, pROC, ggplot2, and timeROC packages. Statistical comparisons used SPSS version 21.0, with T tests or Mann-Whitney U tests for quantitative data and Fisher's exact or chi-squared tests for qualitative data.

TL;DR: Cox regression identified 14 independent prognostic factors, which became inputs for a multilayer perceptron ANN with three hidden layers. The model was evaluated by C-index, time-dependent AUC, calibration curves, and decision curve analysis, then compared against seven established HCC staging systems (BCLC, TNM, Okuda, CLIP, CUPI, JIS, ALBI) and a Cox regression model.

Survival Analysis & Key Findings

Pages 4-6

Median OS of 29.3 Months and the Prognostic Power of T Cell Counts

The median follow-up time was 72.6 months (95% CI: 70.3-76.0). The median overall survival was 29.3 months (95% CI: 27.3-31.9), and the median progression-free survival was 12.0 months (95% CI: 11.3-13.0). Cumulative OS rates at 1, 3, 5, 7, and 10 years were 66.9%, 45.7%, 34.9%, 28.8%, and 22.6%, respectively. Cumulative PFS at 1, 3, 5, and 7 years was 50%, 23.3%, 13.1%, and 8.7%. For patients who underwent resection or minimally invasive treatment, disease-free survival at 1, 3, and 5 years was 73.7%, 46.9%, and 36.7%. There was no significant difference in survival between the training and validation sets (28.4 months vs. 30.3 months, P = 0.683).

Independent risk factors (Cox multivariate): The analysis identified 14 independent risk factors for OS: age at diagnosis, alcohol abuse, tumor size of 5 cm or greater, two or more tumors, portal vein tumor thrombus (PVTT), Child-Pugh stage C, white blood cell count, total bilirubin, lactate dehydrogenase, gamma-glutamyl transferase, alkaline phosphatase, creatinine, AFP of 400 ng/mL or greater, and C-reactive protein. Antiviral therapy, albumin, total T cell count, and CD8 T cell count were identified as independent protective factors.

T cell prognostic impact: Using a cutoff of 907 cells/mL for T cells (determined by maximum Youden index), patients with T cells above this threshold had a median survival more than five times longer than those below it (90 months vs. 17.6 months). High T cell counts reduced death risk significantly (HR = 0.4, 95% CI: 0.35-0.45) and progression risk (HR = 0.51, 95% CI: 0.48-0.57, P < 0.0001). Similar results were observed using a CD8 T cell cutoff of 300 cells/mL. Importantly, the survival benefit of higher T cell and CD8 T cell counts held across different etiologies and treatment subgroups, and was especially pronounced in patients who underwent resection (HR < 0.35, P < 0.001).

TL;DR: Median OS was 29.3 months across 2,670 patients. Patients with T cells above 907 cells/mL survived a median of 90 months versus 17.6 months for those below (HR = 0.4). CD8 T cells above 300 cells/mL showed similar protective effects. Fourteen independent risk factors and four protective factors were identified by Cox multivariate analysis.

ANN Model Performance

Pages 6-8

AUC Values Above 0.83 Across All Time Points, Outperforming Seven Staging Systems

In the training set, the ANN model achieved AUC values of 0.838 (95% CI: 0.819-0.857), 0.833 (95% CI: 0.815-0.851), and 0.843 (95% CI: 0.825-0.861) for predicting 1-year, 3-year, and 5-year OS, respectively, with a C-index of 0.769 (95% CI: 0.757-0.782). By comparison, the traditional Cox regression model using the same variables achieved AUC values of only 0.736, 0.701, and 0.685, with a C-index of 0.712, all significantly lower (P < 0.05). In the validation set, the ANN achieved even higher AUC values: 0.871, 0.831, and 0.848 for 1-year, 3-year, and 5-year OS, with a C-index of 0.773.

Head-to-head with established systems: The ANN model outperformed all seven conventional staging systems across every metric. In the training set, the next-best performer was CLIP (C-index: 0.707, 1-year AUC: 0.788), followed by CUPI (C-index: 0.701, 1-year AUC: 0.764). TNM had a C-index of just 0.633 and 1-year AUC of 0.674. ALBI performed worst with a C-index of 0.604. The differences were statistically significant at P < 0.0001 for all comparisons. Time-dependent ROC curves confirmed that the ANN model maintained higher AUC values than all comparators at every survival time point, in both training and validation cohorts.

Calibration and clinical utility: Calibration curves demonstrated that the ANN model's predicted 1-year, 3-year, and 5-year OS probabilities closely matched the actual observed probabilities in both cohorts. Decision curve analysis showed that the ANN model provided significant net clinical benefits over all seven staging systems across a wide range of threshold probabilities, confirming that it translates from statistical discrimination into practical clinical usefulness.

Subgroup consistency: The authors further tested the model across subgroups defined by age, sex, etiology, AFP level, Child-Pugh grade, era of diagnosis, and treatment type. In all subgroups, the ANN model's AUC and C-index for 1-year, 3-year, and 5-year survival remained higher than those of the seven conventional systems. The model also performed well for disease-free survival prediction across treatment subgroups (resection, minimally invasive, and palliative).

TL;DR: The ANN model achieved AUC values of 0.838-0.843 (training) and 0.831-0.871 (validation) for 1- to 5-year OS, with C-indices of 0.769 and 0.773. It significantly outperformed BCLC, TNM, Okuda, CUPI, CLIP, JIS, and ALBI (all P < 0.0001). Calibration curves confirmed good fit, and DCA demonstrated clear net clinical benefit.

Risk Stratification

Pages 8-9

Three-Tier Risk Groups with Hazard Ratios Up to 8.65 for the High-Risk Stratum

Using the 40th and 70th percentiles of the ANN model score, all patients were divided into three risk strata: low risk (stratum 1), medium risk (stratum 2), and high risk (stratum 3). In the training set, compared with low-risk patients, the hazard ratios for overall survival were 3.01 (95% CI: 2.59-3.50, P < 0.0001) for the medium-risk group and 8.11 (95% CI: 7.0-9.4, P < 0.0001) for the high-risk group. For progression-free survival, the corresponding HRs were 2.15 (95% CI: 1.90-2.45) and 4.98 (95% CI: 4.38-5.66), both P < 0.0001.

Validation cohort confirmation: In the validation set, the pattern held with even stronger separation. The HR for OS was 3.12 (95% CI: 2.50-3.89, P < 0.0001) for the medium-risk group and 8.65 (95% CI: 6.93-10.79, P < 0.0001) for the high-risk group. For PFS, the HR values were 2.28 (95% CI: 1.87-2.77) and 5.58 (95% CI: 4.59-6.80), respectively. These results demonstrate that the ANN model can robustly distinguish patients according to their mortality risk.

Subgroup stratification: Kaplan-Meier survival curves based on the ANN risk strata were drawn for subgroups defined by etiology, liver function (Child-Pugh stage), enrollment period, and treatment type. The model successfully discriminated between risk groups in nearly all subgroups. The only exception was Child-Pugh C patients, where there was no significant difference between the medium-risk and low-risk groups (log-rank P = 0.06), likely because these patients already have severely compromised liver function that dominates their prognosis regardless of other factors. The model also showed strong performance in stratifying disease-free survival and early recurrence risk across treatment subgroups.

TL;DR: Three risk strata separated patients effectively: high-risk patients had an 8.11-fold (training) and 8.65-fold (validation) higher death risk than low-risk patients. PFS hazard ratios reached 4.98 (training) and 5.58 (validation) for the high-risk group. Stratification worked across nearly all clinical subgroups except Child-Pugh C.

Immunological Context

Pages 9-10

Why Circulating T Cells Matter for HCC Prognosis

The study provides strong evidence that circulating T cell and CD8 T cell counts are independent protective factors for HCC survival. This finding builds on a growing body of evidence linking the immune microenvironment to cancer outcomes. High densities of tumor-infiltrating lymphocytes (TILs) have been associated with improved prognosis in breast, colorectal, lung, and other cancers. For HCC specifically, prior studies demonstrated that high intratumoral CD3 and CD8 T cell density significantly reduces recurrence after hepatectomy. However, those findings were limited to surgical patients. This study extends the prognostic value of T cell immunity to the broader HCC population, including those with unresectable disease.

Tumor immune editing: The biological rationale involves the concept of tumor immune editing. In a healthy immune response, T cells recognize and eliminate tumor cells. However, the tumor microenvironment can suppress immune function through multiple mechanisms: reducing antigenicity and immunogenicity, secreting inhibitory molecules like TGF-beta and interleukin-10, and increasing the proportion of suppressor cells including regulatory T cells (Tregs) and myeloid-derived suppressor cells. T cell exhaustion is a particularly important phenomenon, where T cells with high expression of inhibitory receptors such as PD-1, TIGIT, and TIM-3 progressively lose their proliferative and cytotoxic capacity.

Clinical implications: The authors' previous work showed that high PD-1 and TIGIT expression on T cell surfaces in HCC patients was associated with disease progression, which may explain why reduced circulating T cell counts correlate with poor outcomes. The finding that patients with T cells above 907 cells/mL survived a median of 90 months versus 17.6 months for those below this cutoff (a more than five-fold difference) underscores the clinical significance of peripheral immune status. This is especially relevant because circulating T cell counts can be measured from a simple blood draw, making them far more accessible than intratumoral immune cell assessments that require tissue biopsies.

TL;DR: Circulating T cell and CD8 T cell counts are independent protective factors for HCC survival. Patients with T cells above 907 cells/mL had 5x longer median survival (90 vs. 17.6 months). The biological mechanism involves T cell exhaustion and immune editing in the tumor microenvironment. Unlike intratumoral immune markers, circulating T cells require only a blood draw.

Limitations & Future Directions

Pages 10-11

Single-Center Design and HBV-Dominated Cohort Limit Generalizability

Overfitting risk: ANNs with a large number of parameters are inherently prone to overfitting, meaning the model might perform well on the data it was trained on but fail with new patient populations. The authors argue that the large sample size (n = 2,670) and fine-tuning of hyperparameters reduce this risk. The model did perform well in the holdout validation set and across multiple subgroups, which provides some reassurance. However, the validation set comes from the same institution, which limits how strongly these results can be interpreted.

Single-center, single-etiology dominance: This is the most significant limitation. The study was conducted entirely at Beijing Ditan Hospital, and 84.2% of patients were HBV-positive. This reflects the epidemiology of HCC in China, where HBV is the dominant cause, but it means the model has not been tested in populations where HCV, alcohol-related liver disease, or nonalcoholic steatohepatitis (NASH) are the primary drivers. These different etiologies are associated with distinct biological behaviors and treatment responses. External validation in multi-center, multi-etiology cohorts is essential before the model can be adopted broadly.

Retrospective design: Like most machine learning studies in oncology, this was a retrospective analysis. Prospective validation would be needed to confirm the model's real-world clinical utility. The study also does not explore how the model might integrate with or improve upon immunotherapy-era treatment decisions, which is increasingly relevant given the adoption of checkpoint inhibitors (anti-PD-1/PD-L1) for HCC since 2020.

Future directions: The authors conclude that the ANN model, integrating tumor characteristics, liver function, inflammatory markers, and immune indices, represents a convenient, accurate, and noninvasive prognostic tool. Regular surveillance based on these model indicators could help clinicians make better treatment decisions and prolong patient survival. External validation across diverse populations, incorporation of newer biomarkers (such as circulating tumor DNA or immune checkpoint expression), and prospective clinical trials are logical next steps for translating this model into routine clinical practice.

TL;DR: Key limitations include single-center design, 84.2% HBV-dominant cohort, retrospective analysis, and no external validation. The model needs testing in HCV, alcohol, and NASH-driven HCC populations. Prospective trials and integration with immunotherapy-era biomarkers are needed before clinical adoption.

A Novel Prognostic Score Based on Artificial Intelligence in Hepatocellular Carcinoma: A Literature Review

Original Paper (PDF)