Comparative Analysis of State-of-the-Art Q&A Models: BERT, RoBERTa, DistilBERT, and ALBERT on SQuAD v2 Dataset
PDF File

Keywords

Question- Answering models BERT
RoBERTa
DistilBERT
ALBERT
SQuAD v2 Dataset
Model performance

How to Cite

Comparative Analysis of State-of-the-Art Q&A Models: BERT, RoBERTa, DistilBERT, and ALBERT on SQuAD v2 Dataset. (2024). Chaos and Fractals, 1(1), 19-30. https://doi.org/10.69882/adba.chf.2024073

Abstract

In the rapidly evolving landscape of natural language processing (NLP) and artificial intelligence, recent years have witnessed significant advancements, particularly in text-based question-answering (QA) systems. The Stanford Question Answering Dataset (SQuAD v2) has emerged as a prominent benchmark, offering diverse language understanding challenges. This study conducts a thorough examination of cutting-edge QA models—BERT, DistilBERT, RoBERTa, and ALBERT—each featuring distinct architectures, focusing on their training and performance on SQuAD v2.The analysis aims to uncover the unique strengths of each model, providing insights into their capabilities and exploring the impact of different training techniques on their performance. The primary objective is to enhance our understanding of text-based QA systems' evolution and their effectiveness in real-world scenarios. The results of this comparative study are poised to influence the utilization and development of these models in both industry and research.The investigation meticulously evaluates BERT, ALBERT, RoBERTa, and DistilBERT QA models using the SQuAD v2 dataset, emphasizing instances of accurate responses and identifying areas where completeness may be lacking. This nuanced exploration contributes to the ongoing discourse on the advancement of text-based question-answering systems, shedding light on the strengths and limitations of each QA model.Based on the results obtained, ALBERT achieved an exact match of 86.85% and an F1 score of 89.91% on the SQuAD v2 dataset, demonstrating superior performance in both answerable ('HasAns') and unanswerable ('NoAns') questions. BERT and RoBERTa also showed strong performance, while DistilBERT lagged slightly behind. This study provides a significant contribution to the advancement of text-based question-answering systems, offering insights that can shape the utilization of these models in both industry and research domains.

PDF File

References

An, Q., B. Pan, Z. Liu, S. Du, and Y. Cui, 2023 Chinese named entity recognition in football based on albert-bilstm model. Applied Sciences 13: 10814.

Benedetto, L., 2023 A quantitative study of nlp approaches to question difficulty estimation. In International Conference on Artificial Intelligence in Education, pp. 428–434, Springer Nature Switzerland.

Caballero, M., 2021 A brief survey of question answering systems. International Journal of Artificial Intelligence & Applications (IJAIA) 12.

David, J., 2020 Comparing the representation learning of autoencoding transformer models in ad hoc information retrieval.

Devlin, J. et al., 2018 Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Fu, X., J. Du, H. T. Zheng, J. Li, C. Hou, et al., 2023 Ss-bert: A semantic information selecting approach for open-domain question answering. Electronics 12: 1692.

Ghanem, R., H. Erbay, and K. Bakour, 2023 Contents-based spam detection on social networks using roberta embedding and stacked blstm. SN Computer Science 4: 380.

Gillioz, A., J. Casas, E. Mugellini, and O. Abou Khaled, 2020 Overview of the transformer-based models for nlp tasks. In 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 179–183, IEEE.

Greco, C. M., A. Tagarelli, and E. Zumpano, 2022 A comparison of transformer-based language models on nlp benchmarks. In International Conference on Applications of Natural Language to Information Systems, pp. 490–501, Springer International Publishing.

Kasai, J. et al., 2024 Realtime qa: What’s the answer right now? Advances in Neural Information Processing Systems 36.

Kumar, A., T. Ranjan, and S. Raghav, 2023 Building conversational question answer machine and comparison of bert and its different variants. In 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), pp. 240–245, IEEE.

Kumari, V., S. Keshari, Y. Sharma, and L. Goel, 2022a Context-based question answering system with suggested questions. In 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 368–373, IEEE.

Kumari, V., Y. Sharma, and L. Goel, 2022b A comparative analysis of transformer-based models for document visual question answering. In International Conference on Computational Intelligence and Data Engineering, pp. 231–242, Springer Nature Singapore.

Lan, Z. et al., 2019 Albert: A lite bert for self-supervised learning of language representations.

Liu, Y. et al., 2019 Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

MacRae, C., 2022 NOLEdge: Creating an Intelligent Search Tool for the Florida State University Computer Science Department Using Fine-Tuned Transformers and Data Augmentation.

Malla, S. and P. J. A. Alphonse, 2021 Covid-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets. Applied Soft Computing 107: 107495.

Nassiri, K. and M. Akhloufi, 2023 Transformer models used for text-based question answering systems. Applied Intelligence 53: 10602–10635.

Pereira, J., R. Fidalgo, R. Lotufo, and R. Nogueira, 2023 Visconde: Multi-document qa with gpt-3 and neural reranking. In European Conference on Information Retrieval, pp. 534–543, Springer Nature Switzerland.

Pirozelli, P., A. A. Brandão, S. M. Peres, and F. G. Cozman, 2022 To answer or not to answer? filtering questions for qa systems. In Brazilian Conference on Intelligent Systems, pp. 464–478, Springer International Publishing.

Rawat, A. and S. S. Samant, 2022 Comparative analysis of transformer based models for question answering. In 2022 2nd International Conference on Innovative Sustainable Computational Technologies (CISCT), pp. 1–6, IEEE.

Sabharwal, N. and A. Agrawal, 2021 Hands-on Question Answering Systems with BERT: Applications in Neural Networks and Natural Language Processing.

Schütz, M., A. Schindler, M. Siegel, and K. Nazemi, 2021 Automatic fake news detection with pretrained transformer models. In Pattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10-15, 2021, Proceedings, Part VII, pp. 627–641, Springer International Publishing.

Sidorov, G., F. Balouchzahi, S. Butt, and A. Gelbukh, 2023 Regret and hope on transformers: An analysis of transformers on regret and hope speech detection datasets. Applied Sciences 13: 3983.

Srivastava, V., S. Pilli, S. Bhat, N. Pedanekar, and S. Karande, 2021 What berts and gpts know about your brand? probing contextual language models for affect associations. In Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 119–128.

Sundelin, C., 2023 Comparing different transformer models’ performance for identifying toxic language online.

Tahsin Mayeesha, T., A. Md Sarwar, and R. M. Rahman, 2021 Deep learning based question answering system in bengali. Journal of Information and Telecommunication 5: 145–178.

Tripathy, J. K., S. S. Chakkaravarthy, S. C. Satapathy, M. Sahoo, and V. Vaidehi, 2022 Albert-based fine-tuning model for cyberbullying analysis. Multimedia Systems 28: 1941–1949.

Tripathy, J. K., S. C. Sethuraman, M. V. Cruz, A. Namburu, P. Mangalraj, et al., 2021 Comprehensive analysis of embeddings and pre-training in nlp. Computer Science Review 42: 100433.

Wang, N., R. R. Issa, and C. J. Anumba, 2022 Nlp-based query-answering system for information extraction from building information models. Journal of computing in civil engineering 36: 04022004.

Yang, Z. et al., 2019 Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32.

Yasunaga, M., H. Ren, A. Bosselut, P. Liang, and J. Leskovec, 2021 Qa-gnn: Reasoning with language models and knowledge graphs for question answering. arXiv preprint arXiv:2104.06378.

Yu, M. and A. Sun, 2023 Dataset versus reality: Understanding model performance from the perspective of information need. Journal of the Association for Information Science and Technology 74: 1293–1306.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.