AEPH
Home > Industry Science and Engineering > Vol. 3 No. 2 (ISE 2026) >
Prediction and Validation of Liver Disease Based on Machine Learning Models
DOI: https://doi.org/10.62381/I265202
Author(s)
Guoyin Li, Guanghu Zhu, Zexian Lu, Zhen Wang*
Affiliation(s)
Guilin University of Electronic Technology, Guilin, Guangxi, China *Corresponding Author
Abstract
This study aims to construct and validate machine learning models for liver disease prediction, screen the optimal model, and interpret the model using Shapley Additive exPlanations (SHAP) and Feature Permutation Method (FPM). The "Liver Disease Patient Dataset 30K train data" released on the Kaggle platform was selected, with a total of 16,308 samples included, including 11,669 patients with liver disease (71.55%), and 5-fold cross-validation was performed. Derivative processing was performed on the original features, and four algorithms were used to construct prediction models. Model performance was assessed using the area under the receiver operating characteristic curve (AUC). Meanwhile, model interpretability was illustrated using performance radar charts, decision curve analysis, feature permutation importance figures, and SHAP visualization plots. In the test set via 5-fold cross-validation, the gradient boosting decision tree (GBDT) delivered the best overall performance, with an average AUC of 0.9995 (95% CI: 0.9990–1.0000). The SHAP heatmap and feature permutation importance plot showed that ALT/ALP and ALP had the greatest impact on liver disease identification. Four machine learning models for liver disease prediction were successfully constructed and validated, among which the GBDT model performed the best, which can offer a dependable basis for the early screening of clinical patients with liver disease.
Keywords
Liver Disease; Machine Learning; SHAP Value; Feature Permutation
References
[1]XIAO J, WANG F, YUAN Y, et al. Epidemiology of liver diseases: global disease burden and forecasted research trends. SCIENCE CHINA Life Sciences, 2025, 68(2). DOI: 10.1007/s11427-024-2722-2. [2]ROSENSTENGLE C, SERPER M, ASRANI SK, et al. Variation in intention-to-treat survival by MELD subtypes: All models created for end-stage liver disease are not equal. Journal of Hepatology, 2025, 82(2). DOI:10.1016/j.jhep.2024.08.006. [3]PATEL K, ASRANI SK, FIEL MI, et al. Accuracy of blood-based biomarkers for staging liver fibrosis in chronic liver disease: A systematic review supporting the AASLD Practice Guideline. Hepatology, 2025, 81(1):22. DOI: 10.1097/HEP. 0000000000000842. [4]YOUNES R, CAVIGLIA GP, GOVAERE O, et al. Longte-rm outcomes and predictive ability of non-invasive scoring systems in patients with non-alcoholic fatty liver disease. Journal of Hepatology, 2021(7). DOI:10.1016/j.jhep.2021.05.008. [5]POLAT K, AHAN S, KODAZ H, et al. Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism. Expert Systems with Applications, 2007, 32(1):172-183. DOI:10.1016/j.eswa.2005.11.024. [6]TRIPATHI A, RAGIRI PR, JAIN D, et al. Machine Learning-based Predictive Models for Early Diagnosis of Liver Disease. Journal of Scientific & Industrial Research, 2025, 84(5). DOI:10.56042/jsir.v84i5.14828. [7]GHOSH M, RAIHAN MMS, RAIHAN M, et al. A Comparative Analysis of Machine Learning Algorithms to Predict Liver Disease. Intelligent Automation and Soft Computing, 2021(3). DOI: 10.32604/ IASC.2021.017989. [8]LIN RH. An intelligent model for liver disease diagnosis. Artificial Intelligence in Medicine, 2009, 47(1): 53-62. DOI: 10.1016/ j.artmed.2009.05.005. [9]MARZOUK R, COLIN DLH. On the Tractability of SHAP Explanations under Markovian Distributions. 2024. [10]DENG Y, ZHI P, ZHU W, et al. Prediction of PM2.5 Concentration Based on Bayesian Optimization Random Forest. 2024 43rd Chinese Control Conference (CCC), 2024:8507-8511. DOI: 10.23919/ccc63176. 2024.10662123.
Copyright @ 2020-2035 Academic Education Publishing House All Rights Reserved