Machine Learning Interpretability in Diabetes Risk Assessment: A SHAP Analysis

Mustafa Kutlu; Turker Berk Donmez; Chris Freeman

doi:10.69882/adba.cem.2024075

Vol. 1 No. 1 (2024), Articles

Vol. 1 No. 1 (2024)

Machine Learning Interpretability in Diabetes Risk Assessment: A SHAP Analysis

Articles

Published 2024-07-05

Mustafa Kutlu⁺⁻
Turker Berk Donmez⁺⁻
Chris Freeman⁺⁻

Mustafa Kutlu

Sakarya University of Applied Sciences

https://orcid.org/0000-0003-1663-2523

Turker Berk Donmez

Sakarya University of Applied Sciences

https://orcid.org/0000-0002-1008-547X

Chris Freeman

University of Southampton

https://orcid.org/0000-0003-0305-9246

PDF File

Keywords

Explainable AI
Diabetes
Recursive feature elimination

How to Cite

Machine Learning Interpretability in Diabetes Risk Assessment: A SHAP Analysis. (2024). Computers and Electronics in Medicine, 1(1), 34-44. https://doi.org/10.69882/adba.cem.2024075

Abstract

Diabetes continues to be a complicated and prevalent metabolic illness, providing a serious burden to public health. While machine learning approaches like extreme gradient boosting (XGBoost) provide intriguing options for diabetes prediction, their 'black-box' nature typically limits clinical interpretability. To overcome this gap, our work applied SHapley Additive exPlanations (SHAP) to give insights into the XGBoost model's predictions. The dataset utilized in this research comprised of 253,680 patients and contained 21 parameters, such as General Health Status, High Blood Pressure Status, Age, and Body Mass Index. After feature selection using Recursive Feature Elimination (RFE), 15 important characteristics were discovered. In the test set, the XGBoost model obtained an accuracy of 86.6%, precision of 54.1%, recall of 17.0%, and an F1-score of 25.9% for the Original dataset. For the RFE dataset, the model displayed an accuracy of 86.6%, precision of 54.9%, recall of 16.5%, and an F1-score of 25.3%. SHAP analysis found that General Health Status, High Blood Pressure Status, Age, and Body Mass Index were the most important characteristics in both the Original and RFE datasets. This work provides as a platform for transparent and clinically applicable predictive modeling, assisting in early diabetes identification and preventive healthcare.

PDF File

References

Afsaneh, E., A. Sharifdini, H. Ghazzaghi, and M. Z. Ghobadi, 2022 Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetology & Metabolic Syndrome 14: 1–39.

Carreras, J., S. Hiraiwa, Y. Y. Kikuti, M. Miyaoka, S. Tomita, et al., 2021 Artificial neural networks predicted the overall survival and molecular subtypes of diffuse large b-cell lymphoma using a pancancer immune-oncology panel. Cancers 13: 6384.

Chen, X.-w. and J. C. Jeong, 2007 Enhanced recursive feature elimination. In Sixth international conference on machine learning and applications (ICMLA 2007), pp. 429–435, IEEE.

Fitriyani, N. L., M. Syafrudin, G. Alfian, and J. Rhee, 2020 Hdpm: an effective heart disease prediction model for a clinical decision support system. IEEE Access 8: 133034–133050.

Gao, X. R., M. Chiariglione, K. Qin, K. Nuytemans, D.W. Scharre, et al., 2023 Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction. Scientific Reports 13: 450.

Gómez-Peralta, F. and C. Abreu, 2022 Clinical research on type 2 diabetes: A promising and multifaceted landscape.

Gong, H., M. Wang, H. Zhang, M. F. Elahe, and M. Jin, 2022 An explainable ai approach for the rapid diagnosis of covid-19 using ensemble learning algorithms. Frontiers in Public Health 10: 874455.

Gutch, M., S. Rungta, S. Kumar, A. Agarwal, A. Bhattacharya, et al., 2017 Thyroid functions and serum lipid profile in metabolic syndrome. Biomedical journal 40: 147–153.

Hasan, M. A., M. F. K. Chowdhury, S. Yasmin, S. Paul, T. Ahmed, et al., 2021 Association between glycemic control and serum lipid profile in type 2 diabetic patients: Experience in a medical college hospital. Bangabandhu Sheikh Mujib Medical University Journal 14: 138–143.

He, Z., Y. Yang, R. Fang, S. Zhou, W. Zhao, et al., 2023 Integration of shapley additive explanations with random forest model for quantitative precipitation estimation of mesoscale convective systems. Frontiers in Environmental Science 10: 1057081.

Lin, Y., Y. Li, X. Huang, L. Liu, H. Wei, et al., 2022 Analysis of diabetes clinical data based on recurrent neural networks. Computational Intelligence and Neuroscience 2022.

Mihai, D. A., D. S. Stefan, D. Stegaru, G. E. Bernea, I. A. Vacaroiu, et al., 2022 Continuous glucose monitoring devices: A brief presentation. Experimental and Therapeutic Medicine 23: 1–6.

Organization, W. H. et al., 2019 Classification of diabetes mellitus.

Qin, Y., J. Wu, W. Xiao, K. Wang, A. Huang, et al., 2022 Machine learning models for data-driven prediction of diabetes by lifestyle type. International Journal of Environmental Research and Public Health 19: 15027.

Shankaracharya, S., 2017 Diabetes risk prediction using machine learning: Prospect and challenges. Bioinformatics, Proteomics and Imaging Analysis 3: 194–195.

Shao, N. and H. Hu, 2022 Exploring the path of enhancing ideological and political education in universities in the era of big data. Journal of Environmental and Public Health 2022.

Zhang, Y., X. Zhang, J. Razbek, D. Li, W. Xia, et al., 2022 Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome. BMC Endocrine Disorders 22: 1–15.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Machine Learning Interpretability in Diabetes Risk Assessment: A SHAP Analysis

Keywords

How to Cite

Download Citation

Abstract

References

Similar Articles