Abstract
Milk quality assessment is of critical importance for food safety and public health. Traditional milk quality evaluation relies on physicochemical measurements that require expert interpretation and may not directly support rapid or large-scale decision-making, increasing the need for data-driven and automated assessment methods. Although machine learning-based approaches have been widely applied in milk quality classification in recent years, the lack of transparency in model decision mechanisms and insufficient reporting of data preprocessing and data leakage control procedures pose significant limitations in terms of reliability. In this study, the milk quality classification performance of various supervised machine learning algorithms is comparatively evaluated using an open-access milk dataset. Random Forest (RF), k-Nearest Neighbors (KNN), Support Vector Machines (SVM), Extreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGBM), and Artificial Neural Network (ANN) models are assessed under fair and consistent experimental conditions. The main contribution of this study lies in the application of group-based data partitioning strategies to prevent data leakage, rather than directly removing duplicate or highly similar records from the dataset. This approach prevents data loss and enables a more realistic evaluation of model performance. Furthermore, a targeted and minimalist preprocessing strategy is adopted by applying scaling exclusively to continuous variables. For hyperparameter optimization, Grid Search and Particle Swarm Optimization (PSO) methods are employed; notably, tree-based models optimized using PSO demonstrate more consistent classification performance. To move beyond predictive accuracy, Explainable Artificial Intelligence (XAI) approaches are utilized to enhance the interpretability of model decisions. In this context, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) methods are applied to analyze the contributions of key features influencing milk quality. Experimental results indicate that physicochemical properties, particularly pH, fat content, and temperature, play a decisive role in milk quality prediction. In conclusion, this study demonstrates that the combined use of machine learning and explainability techniques provides significant contributions in terms of reliability, transparency, and methodological robustness in milk quality classification.
References
Arrighi, L., de Moraes, I. A., Zullich, M., Simonato, M., Barbin, D. F., et al. (2025). Explainable artificial intelligence techniques for interpretation of food datasets: A review. arXiv preprint, arXiv:2504.10527.
Azad, T., & Ahmed, S. (2016). Common milk adulteration and their detection techniques. International Journal of Food Contamination, 3, 22.
Bhavsar, D., Jobanputra, Y., Swain, N. K., & Swain, D. (2023). Milk quality prediction using machine learning. EAI Endorsed Transactions on Internet of Things, 10.
Çelik, A. (2022). Using machine learning algorithms to detect milk quality. Eurasian Journal of Food Science and Technology, 6, 76–87.
Çetintav, B., & Yalçın, A. (2025). Explainable machine learning framework for milk quality grading. Kocatepe Veterinary Journal, 18, 227–235.
Chaudhari, A., Mane, R., Khot, A., Kadam, A., & Rajam, N. (2025). Machine learning-based classification for milk quality assessment. In 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT) (pp. 1441–1446). IEEE.
Chowdhury, R., Das, R., Ananna, F. B. F., Saha, A., Nawar, S., et al. (2024). Unveiling predictive factors in apple quality: Leveraging LIME, SHAP, and the synergy of machine learning models and artificial neural networks. In 2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT) (pp. 1026–1031). IEEE.
Goyal, K., Kumar, P., & Verma, K. (2024). XAI-empowered IoT multisensor system for real-time milk adulteration detection. Food Control, 164, 110495.
Horasan, F., Erbay, H., Varçın, F., & Deniz, E. (2019). Alternate low-rank matrix approximation in latent semantic analysis. Scientific Programming, 2019, 1095643.
Kumari, S., Gourisaria, M. K., Das, H., & Banik, D. (2023). Deep learning-based approach for milk quality prediction. In 2023 11th International Conference on Emerging Trends in Engineering & Technology–Signal and Information Processing (ICETET-SIP) (pp. 1–6). IEEE.
Kurtanjek, Ž. (2024). Causal artificial intelligence models of food quality data. Food Technology and Biotechnology, 62, 102–109.
Manisha, N., & Jagadeeshwar, M. (2023). BC-driven IoT-based food quality traceability system for dairy product using deep learning model. High-Confidence Computing, 3, 100121.
Mhapsekar, R., Kilbane, D., Davy, S., Abraham, L., Fenelon, M., et al. (2025). A systematic review of the Internet of Things and artificial intelligence applications in milk quality monitoring and analysis. International Journal of Dairy Technology, 78, e70049.
Mu, F., Gu, Y., Zhang, J., & Zhang, L. (2020). Milk source identification and milk quality estimation using an electronic nose and machine learning techniques. Sensors, 20, 4238.
Murphy, S. C., Martin, N. H., Barbano, D. M., & Wiedmann, M. (2016). Influence of raw milk quality on processed dairy products: How do raw milk quality test results relate to product quality and yield? Journal of Dairy Science, 99, 10128–10149.
Neto, H. A., Tavares, W. L., Ribeiro, D. C., Alves, R. C., Fonseca, L. M., et al. (2019). On the utilization of deep and ensemble learning to detect milk adulteration. BioData Mining, 12, 13.
Peri, C. (2006). The universe of food quality. Food Quality and Preference, 17, 3–8.
Polat, O., Akçok, S. G., Akbay, M. A., Topaloğlu, D., Arslan, S., et al. (2021). Classification of raw cow milk using information fusion framework. Journal of Food Measurement and Characterization, 15, 5113–5130.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144).
Samad, A., Taze, S., & Uçar, M. K. (2024). Enhancing milk quality detection with machine learning: A comparative analysis of KNN and distance-weighted KNN algorithms. International Journal of Innovative Science Research and Technology, 9, 2021–2029.
Sarveswaran, S., Jha, S., Soundarya, B., et al. (2023). MilkSafe: A hardware-enabled milk quality prediction using machine learning. In 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN) (pp. 1–6). IEEE.
Sermmany, K., Wanjantuk, P., & Leelapatra, W. (2024). Utilizing explainable artificial intelligence (XAI) to identify determinants of coffee quality. In 2024 21st International Joint Conference on Computer Science and Software Engineering (JCSSE) (pp. 696–703). IEEE.
Shahzad, A., Javaid, S., & Alamsyah, Z. (2025). Milk quality detection using machine learning. Engineering Proceedings, 107, 119.
Shapley, L. (1953). Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39, 1095–1100.
Shrijayan (cpluzshrijayan). (n.d.). Milk quality prediction dataset. Kaggle. https://www.kaggle.com/datasets/cpluzshrijayan/milkquality, accessed 2026-01-25.
Sunithamani, S., Muralidhar, D., Anne, G., & Sruthi, C. N. (2024). Milk quality prediction using machine learning integrated with Arduino. In 2024 10th International Conference on Communication and Signal Processing (ICCSP) (pp. 1268–1273). IEEE.
Tahtalı, Y. (2020). Classification of raw milk composition and somatic cell count in water buffaloes with support vector machines. Kafkas Üniversitesi Veteriner Fakültesi Dergisi, 26.
Thanasirikul, C., Patumvan, A., Lipsky, D., Bovonsombut, S., Singjai, P., et al. (2023). Rapid assessment and prediction of microbiological quality of raw milk using machine learning based on RGB-colourimetric resazurin assay. International Dairy Journal, 146, 105750.
Tolba, A., Mostafa, N., Mohamed, A., & Sallam, K. (2024). Hybrid deep learning approach for milk quality prediction. Precision Livestock Journal, 1, 1–13.
Veena, V., & Poovammal, E. (2025). An improved multi-classification of milk quality using machine learning. In 2025 2nd International Conference on Trends in Engineering Systems and Technologies (ICTEST) (Vol. 1, pp. 1–6). IEEE.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
