Benchmarking State-of-the-Art Vision Transformer Architectures for the Automated Classification of Pigmented Skin Lesions
Pdf File

Keywords

Vision transformers (ViTs)
Skin cancer classification
HAM10000
Dataset
Computer-aided diagnosis (CAD)
DeiT III-Base

How to Cite

Benchmarking State-of-the-Art Vision Transformer Architectures for the Automated Classification of Pigmented Skin Lesions. (2026). Computers and Electronics in Medicine, 3(1), 42-47. https://doi.org/10.69882/adba.cem.2026015

Abstract

Skin cancer represents an escalating global public health challenge where early detection is paramount, potentially increasing five-year survival rates to 99%. While dermoscopy improves diagnostic sensitivity, its effectiveness often depends on clinician experience and is subject to inter-observer variability. To address these limitations, this study presents a rigorous comparative analysis of four state-of-the-art Vision Transformer (ViT) architectures, DeiT III-Base, Swin-Base, ViT-Base, and PiT-B, for the automated classification of pigmented skin lesions. We utilized the HAM10000 dataset (n=10,011) and implemented a stratified 70-15-15 split to ensure balanced training, validation, and testing phases. Images were resized to 224×224 pixels and normalized using ImageNet parameters, while transfer learning was employed to stabilize training and enhance generalization. Experimental results indicate that DeiT III-Base achieved superior diagnostic efficacy, reaching an accuracy of 92.04% and an F1-score of 85.44%. Furthermore, computational evaluation revealed that DeiT III-Base and ViT-Base offered highly efficient clinical throughput with sub-millisecond inference times (0.5674 ms and 0.5459 ms, respectively), whereas PiT-B exhibited the lowest computational workload (21.1067 GFLOPs). These findings underscore the viability of attention-based paradigms as robust real-time Computer-Aided Diagnosis (CAD) tools. Future research will explore the integration of multi-modal patient data and Explainable AI (XAI) to foster transparency and clinical trust.

Pdf File

References

Skin cancer: Ham10000 dataset.

Armstrong, B. K. and A. Kricker, 1995. Skin cancer. Dermatologic Clinics, 13: 583–594.

Aruk, I., I. Pacal, and A. N. Toprak, 2026. A comprehensive comparison of convolutional neural network and visual transformer models on skin cancer classification. Computational Biology and Chemistry, 120.

Bengio, Y., 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the 29th International Conference on Machine Learning, 27: 17–36.

Bruno, A., A. Artesani, P. L. Mazzeo, F. Janan, G. Yang, et al., 2025. Boosting skin cancer classification: A multi-scale attention and ensemble approach with vision transformers. Sensors, 25: 2479.

Cakmak, Y. and A. Maman, 2025. Deep learning for early diagnosis of lung cancer. Computational Systems and Artificial Intelligence, 1: 20–25.

Cakmak, Y. and I. Pacal, 2025. Comparative analysis of transformer architectures for brain tumor classification. Exploratory Medicine, 6.

Çakmak, Y. and N. Pacal, 2025. Deep learning for automated breast cancer detection in ultrasound: A comparative study of four CNN architectures. Artificial Intelligence in Applied Sciences, 1: 13–19.

Chaurasia, A. K., P. W. Toohey, H. C. Harris, and A. W. Hewitt, 2025. Multi-resolution vision transformer model for histopathological skin cancer subtype classification using whole slide images. Computers in Biology and Medicine, 196.

Dagnaw, G. H., M. El Mouhtadi, and M. Mustapha, 2024. Skin cancer classification using vision transformers and explainable artificial intelligence. Journal of Medical Artificial Intelligence, 7.

Dosovitskiy, A., L. Beyer, A. Kolesnikov, et al., 2020. An image is worth 16x16 words: Transformers for image recognition at scale.

Gloster Jr, H. M. and D. G. Brodland, 1996. The epidemiology of skin cancer. Dermatologic Surgery, 22: 217–226.

Gloster Jr, H. M. and K. Neal, 2006. Skin cancer in skin of color. Journal of the American Academy of Dermatology, 55: 741–760.

Jerant, A. F., J. T. Johnson, C. D. Sheridan, and T. J. Caffrey, 2000. Early detection and treatment of skin cancer. American Family Physician, 62: 357–368.

Karthik, R., R. Menaka, S. Atre, J. Cho, and S. V. Easwaramoorthy, 2024. A hybrid deep learning approach for skin cancer classification using swin transformer and dense group shuffle non-local attention network. IEEE Access, 12: 158040–158051.

Liu, Z., Y. Lin, Y. Cao, et al., 2021. Swin transformer: Hierarchical vision transformer using shifted windows.

Madan, V., J. T. Lear, and R.-M. Szeimies, 2010. Non-melanoma skin cancer. The Lancet, 375: 673–685.

Manju, V. N., D. S. Dayana, N. Patwari, K. P. B. Madavi, and K. K. Sowjanya, 2025. Attention-enhanced vision transformer model for precise skin cancer detection. In Proceedings of the 2025 International Conference on Emerging Technologies in Computing and Communication (ETCC), IEEE.

Ozdemir, B. and I. Pacal, 2025. An innovative deep learning framework for skin cancer detection employing convnextv2 and focal self-attention mechanisms. Results in Engineering, 25: 103692.

Pacal, I., M. Alaftekin, and F. D. Zengul, 2024. Enhancing skin cancer diagnosis using swin transformer with hybrid shifted window-based multi-head self-attention and swiglu-based mlp. Journal of Imaging Informatics in Medicine, 37: 3174–3192.

Pacal, I. and Y. Cakmak, 2025a. A comparative analysis of u-net-based architectures for robust segmentation of bladder cancer lesions in magnetic resonance imaging. Eurasian Journal of Medicine and Oncology, 9: 268–283.

Pacal, I. and Y. Cakmak, 2025b. Diagnostic Analysis of Various Cancer Types with Artificial Intelligence. Duvar Yayınları.

Ren, H., J. Guo, S. Cheng, and Y. Li, 2024. Pooling-based visual transformer with low complexity attention hashing for image retrieval. Expert Systems with Applications, 241: 122745.

Ren, Z., H. Zhang, T. Huang, et al., 2022. Deep learning (cnn) and transfer learning: A review. Journal of Physics: Conference Series, 2273: 012029.

Sakib, A. H., M. I. H. Siddiqui, S. Akter, A. Al Sakib, and M. R. Mahmud, 2025. Levit-skin: A balanced and interpretable transformer-cnn model for multi-class skin cancer diagnosis. International Journal of Science and Research Archive, 15: 1860–1873.

Siegel, R. L., A. N. Giaquinto, and A. Jemal, 2024. Cancer statistics, 2024. CA: A Cancer Journal for Clinicians, 74: 12–49.

ThangaPurni, J. and M. Braveen, 2025. Unified arp-vit-cnn system: Hybrid deep learning approach for segmenting and classifying multiple skin cancer lesions. Array, p. 100515.

Touvron, H., M. Cord, and H. Jégou, 2022. Deit iii: Revenge of the vit.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.