Income Level Estimation with Light-GBM: Understanding Model Decisions with Explainable AI Techniques Shap and Lime
PDF File

Keywords

Revenue prediction
Machine learning
Explainable artificial intelligence
SMOTE

How to Cite

Income Level Estimation with Light-GBM: Understanding Model Decisions with Explainable AI Techniques Shap and Lime. (2025). Artificial Intelligence in Applied Sciences, 1(1), 7-12. https://doi.org/10.69882/adba.ai.2025072

Abstract

This study examines the use of machine learning and artificial intelligence algorithms to predict individuals' annual incomes. In analyses conducted using the Python programming language, the best performance was achieved in models utilizing the "Synthetic Minority Over-sampling Technique (SMOTE)" for imbalanced data sets, with an accuracy of 87.45%, precision of 85.74%, recall of 89.31%, and an F1 score of 87.30, using the "Light Gradient Boosting Machines" algorithm. Additionally, the impact of parameters and variables on income prediction was examined using interpretable artificial intelligence algorithms. The results of the study emphasize the importance of employing effective methods and explaining machine learning model predictions, as well as addressing imbalanced data sets.

 

PDF File

References

Adadi, A. and M. Berrada, 2018 Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6: 52138–52160.

Antwarg, L., R. M. Miller, B. Shapira, and L. Rokach, 2021 Explaining Anomalies Detected by Autoencoders Using Shapley Additive Explanations. Expert Systems with Applications 186: 115736.

Atasoy, N. A. and A. Demiröz, 2021 Makine Öğrenmesi Algoritmaları Kullanılarak Prostat Kanseri Tümör Oluşumunun İncelenmesi. Avrupa Bilim ve Teknoloji Dergisi pp. 87–92.

Becker, B. and R. Kohavi, 2023 Adult, UCI Machine Learning Repository (1996). https://doi.org/10.24432/C5XW20.

Blagus, R. and L. Lusa, 2013 Improved Shrunken Centroid Classifiers for High-Dimensional Class-Imbalanced Data. BMC Bioinformatics 14: 1–13.

Bulut, F., 2016 Determining Heart Attack Risk Ratio Through AdaBoost. Celal Bayar University Journal of Science 12: 459–472.

Chawla, N. V., K.W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, 2002 SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16: 321–357.

Chen, X., Z. X. Wang, and X. M. Pan, 2019 HIV-1 Tropism Prediction by the XGBoost and HMM Methods. Scientific Reports 9: 9997.

Cıhan, P. and H. Coşkun, 2021 Performance Comparison of Machine Learning Models for Diabetes Prediction. In 2021 29th Signal Processing and Communications Applications Conference (SIU), pp. 1–4, IEEE.

Das, M. and A. Van Soest, 1999 A Panel Data Model for Subjective Information on Household Income Growth. Journal of Economic Behavior & Organization 40: 409–426.

Dominitz, J., 1998 Earnings Expectations, Revisions, and Realizations. Review of Economics and Statistics 80: 374–388.

ElShawi, R., Y. Sherif, M. Al-Mallah, and S. Sakr, 2021 Interpretability in Healthcare: A Comparative Study of Local Machine Learning Interpretability Techniques. Computational Intelligence 37: 1633–1650.

Erdem, F., M. A. Derinpinar, R. Nasirzadehdizaji, O. Y. Selen, D. Z. Şeker, et al., 2018 Rastgele Orman Yöntemi Kullanılarak Kıyı Çizgisi Çıkarımı: İstanbul Örneği. Geomatik 3: 100–107.

Friedman, J. H., 2001 Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics pp. 1189–1232.

Lazar, A., 2004 Income Prediction via Support Vector Machine. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA), pp. 143–149.

Lundberg, S. and S.-I. Lee, 2017 A Unified Approach to Interpreting Model Predictions. CoRR abs/1705.07874: 1–15.

Matkowski, M., 2021 Prediction of Individual Income: A Machine Learning Approach. Preprint or working paper.

Matz, S. C., J. I. Menges, D. J. Stillwell, and H. A. Schwartz, 2019 Predicting Individual-Level Income from Facebook Profiles. PLOS ONE 14: e0214369.

Michalski, R. S., J. G. Carbonell, and T. M. Mitchell, 2013 Machine Learning: An Artificial Intelligence Approach. Springer Science & Business Media.

Mitchell, R. and E. Frank, 2017 Accelerating the XGBoost Algorithm Using GPU Computing. PeerJ Computer Science 3: 1–20.

Molnar, C., 2018 Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub, https://christophm.github.io/interpretable-ml-book/.

Ribeiro, M. T., S. Singh, and C. Guestrin, 2016 “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144, ACM.

Vilone, G. and L. Longo, 2020 Explainable Artificial Intelligence: A Systematic Review. arXiv preprint arXiv:2006.00093: 1–39.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.