Abstract
The primary objective of this study is to enhance the predictive performance of machine learning models used for estimating Click-Through Rate (CTR), a key metric in digital advertising analytics. Beginning with a baseline Logistic Regression (LR) model applied to the “Click-Through Rate Prediction” dataset from Kaggle, the study systematically incorporates multiple optimization layers to improve forecasting accuracy. Inspired by nonlinear dynamics concepts, new feature representations were derived from temporal patterns and textual fields using TF-IDF and Word2Vec-based embeddings. Hyperparameter optimization techniques were then applied to refine model behavior, followed by the construction of ensemble architectures combining LR, XGBoost, Random Forest (RF), and Support Vector Machine (SVM) classifiers. Experimental results show that the optimized ensemble achieved the highest F1-score of 0.8694, yielding an improvement of approximately 12.7% over the baseline model. Overall, the study provides a comprehensive examination of feature extraction strategies, model optimization procedures, and ensemble fusion techniques, demonstrating the clear advantage of hybrid approaches in complex CTR prediction tasks.
References
AgencyAnalytics, 2025. Click-through rate (CTR) definition. https://agencyanalytics.com/kpi-definitions/click-through-rate-ctr
Akiba, T., S. Sano, T. Yanase, T. Ohta, and M. Koyama, 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19), pp. 2623–2631, ACM.
AlAli, M., M. AlQahtani, A. AlJuried, T. AlOnizan, D. Alboqaytah, et al., 2021. Click-through rate effectiveness prediction on mobile ads using extreme gradient boosting. Computers, Materials & Continua, 66: 1681–1696.
Bergstra, J. and Y. Bengio, 2012. Random search for hyperparameter optimization. Journal of Machine Learning Research, 13: 281–305.
Bird, S., E. Klein, and E. Loper, 2009. Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol, CA.
Bratus, O. S. and P. I. Bidyuk, 2023. Towards click-through rate prediction in online advertising. Problems of Applied Mathematics and Mathematical Modeling, 23: 3–17.
Cortes, C. and V. Vapnik, 1995. Support-vector networks. Machine Learning, 20: 273–297.
Gangopadhyay, B., Z. Wang, A. S. Chiappa, and S. Takamatsu, 2025. Adaptive budget optimization for multichannel advertising using combinatorial bandits. arXiv preprint arXiv:2502.02920.
Goldberg, D. E., 1989. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA.
Harris, C. R., K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, et al., 2020. Array programming with NumPy. Nature, 585: 357–362.
Hunter, J. D., 2007. Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9: 90–95.
Karaboga, D. and B. Basturk, 2007. A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39: 459–471.
Ke, G., Q. Meng, T. Finley, T. Wang, W. Chen, et al., 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 30.
Lou, J., 2024. Comparative analysis of logistic regression, random forest, and XGBoost for CTR prediction in digital advertising. In Proceedings of MIED 2024, Atlantis Press.
McKinney, W., 2010. Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference, pp. 56–61.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, et al., 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825–2830.
Pemila, M., R. K. Pongiannan, R. Narayanamoorthi, K. M. AboRas, and A. Youssef, 2024. Application of an ensemble CatBoost model over complex dataset for vehicle classification. PLOS ONE, 19: e0304619.
Řehůřek, R. and P. Sojka, 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50.
Rojas Guillen, J. M., 2024. Click Through Rate Prediction Leveraging Machine Learning Techniques for Mobile Digital Advertisement. Master’s Thesis, Lund University.
Şenel, S. and B. Alatlı, 2014. Lojistik regresyon analizinin kullanıldığı makaleler üzerine bir inceleme. Journal of Measurement and Evaluation in Education and Psychology, 5: 35–52.
Shams, M. Y., A. M. Elshewey, E.-S. M. El-Kenawy, A. Ibrahim, F. M. Talaat, et al., 2024. Water quality prediction using machine learning models based on grid search method. Multimedia Tools and Applications, 83: 35307–35334.
swekerr, 2024. Click-through rate prediction. Kaggle Datasets.
The pandas development team, 2020. pandas-dev/pandas: Pandas.
Waskom, M. L., 2021. Seaborn: Statistical data visualization. Journal of Open Source Software, 6: 3021.
Yang, Y. and P. Zhai, 2022. Click-through rate prediction in online advertising: A literature review. Information Processing & Management, 59: 102853.
Zang, X., 2019. Click prediction for P2P loan ads based on support vector machine. Journal of Physics: Conference Series, 1168: 032042.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
