A Comprehensive Review of LLM-based Text-to-SQL Systems: Methods, Datasets, and Trends
Pdf File

Keywords

SQL
Natural language processing (NLP)
LLM
Relational database

How to Cite

A Comprehensive Review of LLM-based Text-to-SQL Systems: Methods, Datasets, and Trends. (2026). Computational Systems and Artificial Intelligence, 2(1), 1-6. https://doi.org/10.69882/adba.csai.2026011

Abstract

Translation of Natural Language to SQL queries (i.e. Text-to-SQL or NL2SQL) helps the user to easily access the relational database. It also helps in various commercial applications. In recent years, the development of Large Language has increased the performance of the NL2SQL system. It enhances the semantic understanding, schema linking, and SQL generation, even for complex and crossdomain queries. This paper reviews recently published research papers between 2018 - 2025, focusing on LLM-based methods for Text-to-SQL tasks. We examine the system pipelines covering pre-processing, translation, and post-processing stages, along with commonly used datasets and tools. We also discuss advances in schema linking, reasoning-based query generation, and the use of retrieval-augmented generation for providing additional context. Based on the surveyed literature, we summarize key trends, challenges, and future directions, aiming to provide an accessible overview for students and researchers interested in LLM-based NL2SQL systems.

Pdf File

References

Corradini, F., M. Leonesi, and M. Piangerelli, 2025. State of the Art and Future Directions of Small Language Models: A Systematic Review. Big Data and Cognitive Computing 9: 189.

Diallo, I., L. Yao, X. Du, and H. Wang, 2023. Harnessing Large Language Models for Business Analytics: Text-to-SQL for Enterprise Data. arXiv preprint.

Guo, J., Z. Zhang, X. Dong, X. Sun, and Q. Zhang, 2019. Towards Complex Text-to-SQL in Cross-Domain Databases with Intermediate Representation. arXiv preprint.

Hong, Z., Z. Yuan, Q. Zhang, H. Chen, J. Dong, et al., 2025. Next-Generation Database Interfaces: A Survey of LLM-Based Text-to-SQL.

Lin, X. V., C. Li, M. Yasunaga, I. Moreno, L. He, et al., 2022. PICARD: Executing SQL Using Constrained Decoding with Large Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL).

Liu, X., S. Shen, B. Li, P. Ma, R. Jiang, et al., 2025. A Survey of Text-to-SQL in the Era of LLMs: Where Are We, and Where Are We Going? arXiv preprint.

Mohammadjafari, A., A. S. Maida, and R. Gottumukkala, 2025. From Natural Language to SQL: Review of LLM-Based Text-to-SQL Systems.

Pourreza, M. and D. Rafiei, 2023. ValueNet: A Neural Text-to-SQL Architecture Incorporating Values. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Rao, J., Z. Hu, W. Zhang, Y. Zhao, and W. Chen, 2023. Benchmarking LLMs for Text-to-SQL: Are We There Yet? arXiv preprint.

Scholak, T., X. V. Li, and D. Bahdanau, 2021. RAT-SQL: Relation-Aware Schema Encoding for Text-to-SQL Parsers. Transactions of the Association for Computational Linguistics 9: 351–367.

Shi, L., Z. Tang, N. Zhang, X. Zhang, and Z. Yang, 2025. A Survey on Employing Large Language Models for Text-to-SQL Tasks. arXiv preprint.

Wang, B., R. Shin, X. Liu, O. Polozov, M. Richardson, et al., 2020. RAT-SQL + BERT: Enhancing Relation-Aware Encoding for Text-to-SQL Parsing. arXiv preprint.

Yu, T., R. Zhang, H. Er, S. Li, E. Xue, et al., 2019. CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases. In Proceedings of EMNLP-IJCNLP, pp. 1962–1979.

Yu, T., R. Zhang, K. Yang, M. Yasunaga, S. Li, et al., 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arXiv preprint.

Zhang, Y. and X. Zhang, 2025. Text2SQL Is Not Enough: Unifying AI and Databases with TAG.

Zhong, V., C. Xiong, and R. Socher, 2017. Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning. arXiv preprint arXiv:1709.00103.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.