Evaluating the Performance Disparity and the Role of Gender-Aware Approaches in Machine Learning Based Disease Detection
Pdf File

Keywords

Gender bias
Performance disparity
Disease
Detection
Machine learning

How to Cite

Evaluating the Performance Disparity and the Role of Gender-Aware Approaches in Machine Learning Based Disease Detection. (2026). Computers and Electronics in Medicine, 3(1), 1-10. https://doi.org/10.69882/adba.cem.2026011

Abstract

Machine Learning (ML) is gaining attraction in medical research due to its ability to identify unnoticeable patterns by the human eye. However, concerns about fairness in ML models, particularly performance differences across groups, are growing. This study, therefore, focuses on evaluating the performance disparity and the role of gender-aware approaches in ML-based disease detection. It uses the gender-aware approach and introduces its two new variants by testing them on nine different disease datasets. Intensive experimental evaluations reveal that the detection performance can increase up to an F1-score of 1.0, depending on the nature of the dataset at hand. On the other hand, the genderaware approach is successful in mitigating the performance disparity only in three out of nine cases. The variants relying on a crossing-over fashion can capture the relationships and different patterns in some cases, but often fall behind the gender-aware approach. This research distinguishes itself through the use of a significant number of datasets and implemented pipelines, of which two are employed for mitigating performance disparity in disease detection for the first time in the literature. The findings of this study, therefore, make important contributions to the field of disease detection in terms of the aforementioned aspects.

Pdf File

References

Ahsan, M., S. A. Luna, and Z. Siddique, 2022. Machine-learning-based disease diagnosis: A comprehensive review. Healthcare 10: 541.

Ball, J., B. Miller, and E. Balogh, 2015. Improving diagnosis in health care. National Academies Press, Washington.

Bhat, A. M., 2021. Lung cancer. https://www.kaggle.com/datasets/mysarahmadbhat/lungcancer, Accessed 4 Aug 2025.

Breiman, L., 2001. Random forests. Machine Learning 45: 5–32.

Chicco, D. and G. Jurman, 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21: 6.

Coban, O., 2022. A new modification and application of item response theory-based feature selection for different machine learning tasks. Concurrency and Computation: Practice and Experience 34: e7282.

El Kharoua, R., 2024a. Alzheimer’s disease dataset. https://www.kaggle.com/dsv/8668279, Accessed 4 Aug 2025.

El Kharoua, R., 2024b. Asthma disease dataset. https://www.kaggle.com/dsv/8669080, Accessed 4 Aug 2025.

El Kharoua, R., 2024c. Chronic kidney disease dataset. https://www.kaggle.com/dsv/8658224, Accessed 4 Aug 2025.

Fayez, F., 2018. Autism screening data for toddlers. https://www.kaggle.com/datasets/fabdelja/autism-screening-for-toddlers, Accessed 4 Aug 2025.

Freire, P., D. Freire, and C. C. Licon, 2025. A comprehensive review of machine learning and its application to dairy products. Critical Reviews in Food Science and Nutrition 65: 1878–1893.

Friedman, J. H., 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics pp. 1189–1232.

Hogo, M. A., 2020. A proposed gender-based approach for diagnosis of the coronary artery disease. SN Applied Sciences 2: 1060.

Islam, N. and R. Khanam, 2024. Gender variability in machine learning based subcortical neuroimaging for Parkinson’s disease diagnosis. Applied Computing and Informatics.

Khan, R., 2023. Exploring predictive factors for hypertension risk prediction. https://www.kaggle.com/datasets/khan1803115/hypertensionrisk-model-main, Accessed 4 Aug 2025.

Klingenberg, M., D. Stark, F. Eitel, C. Budding, M. Habes, et al., 2023. Higher performance for women than men in MRI-based Alzheimer’s disease detection. Alzheimer’s Research & Therapy 15: 84.

Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence, volume 14, pp. 1137–1145, Montreal, American Association for Artificial Intelligence.

Kondaka, A., 2024. Evaluating gender bias and fairness in skin lesion diagnoses using convolutional neural networks. The National High School Journal of Science 2024: 1–14.

Kotsiantis, S. B., I. Zaharakis, and P. Pintelas, 2007. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160: 3–24.

Kumar, V. and C. Prabha, 2025. Unlocking gender-based health insights with predictive analytics. In AI-Based Nutritional Intervention in Polycystic Ovary Syndrome (PCOS), edited by A. N. Rakesh K., Meenu G., pp. 141–165, Springer, Singapore, first edition.

Li, B., X. Jiang, K. Zhang, A. Harmanci, B. Malin, et al., 2025. Enhancing fairness in disease prediction by optimizing multiple domain adversarial networks. PLOS Digital Health 4: e0000830.

Lozano, R. S., 2025. Assessing Bias in Machine Learning Models for Alzheimer’s Disease Detection Across Gender and Age. Master’s thesis, Leiden University, Leiden.

McKinney, W., 2011. Pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing 14: 1–9.

Meissen, F., S. Breuer, M. Knolle, A. Buyx, R. Muller, et al., 2024. (Predictable) performance bias in unsupervised anomaly detection. EBioMedicine 101: 1–10.

Monaghan, T. F., S. N. Rahman, C. W. Agudelo, A. J. Wein, J. M. Lazar, et al., 2021. Foundational statistical principles in medical research: sensitivity, specificity, positive predictive value, and negative predictive value. Medicina 57: 503.

Mushta, I., S. Koks, A. Popov, and O. Lysenko, 2024. Exploring the potential imaging biomarkers for Parkinson’s disease using machine learning approach. Bioengineering 12: 11.

Negi, H. S., R. Indu, S. C. Dimri, B. Kumar, N. Bisht, et al., 2025. Detecting Alzheimer’s disease (gender-based) using different machine learning approaches. In 10th International Conference on Signal Processing and Communication (ICSC), pp. 357–362, Noida, IEEE.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, et al., 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12: 2825–2830.

Petersen, E., S. Holm, M. Ganz, and A. Feragen, 2023. The path toward equal performance in medical machine learning. Patterns 4: 1–9.

Ranjan, R. B., 2022. Anemia dataset. https://www.kaggle.com/datasets/biswaranjanrao/anemia-dataset, Accessed 4 Aug 2025.

Raza, S., A. Shaban-Nejad, E. Dolatabadi, and H. Mamiya, 2024. Exploring bias and prediction metrics to characterise the fairness of machine learning for equity-centered public health decision-making: A narrative review. IEEE Access 12: 180815–180829.

Sharad, C., P. Mrinal, M. Nandita, and G. Meenu, 2025. AI and Machine Learning in Modern Healthcare. Transforming Gender-Based Healthcare with AI and Machine Learning, Taylor & Francis, New York.

Singhal, A. and D. K. Sharma, 2024. Comparative analysis of gender-wise disease detection based on voice signal analysis. In International Conference on Next-Generation Communication and Computing, edited by S. K. D., S. R., and P. S., pp. 389–401, Ghaziabad.

Soriano, F., 2021. Heart failure prediction dataset. https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction, Accessed 4 Aug 2025.

Straw, I. and H. Wu, 2022. Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ Health & Care Informatics 29: e100457.

W.H.O., 2024. The top 10 causes of death. https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death, Accessed 4 Aug 2025.

Win, J., 2022. Celiac disease (coeliac disease). https://www.kaggle.com/datasets/jackwin07/celiac-diseasecoeliac-disease, Accessed 4 Aug 2025.

Yang, G., S. Luo, and P. Greer, 2025. Advancements in skin cancer classification: a review of machine learning techniques in clinical image analysis. Multimedia Tools and Applications 84: 9837–9864.

Zhang, T., 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, pp. 116–116, Banff Alberta, ACM.

Zhu, J., H. Zou, S. Rosset, and T. Hastie, 2009. Multi-class AdaBoost. Statistics and its Interface 2: 349–360.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.