Benchmarking QLoRA-Fine-Tuned LLaMA and DeepSeek Models for Sentiment Analysis on Movie Reviews and Twitter Data

Seda Bayat Toksoz; Gultekin Isik

doi:10.69882/adba.csai.2026015

Vol. 2 No. 1 (2026), Articles

Vol. 2 No. 1 (2026)

Benchmarking QLoRA-Fine-Tuned LLaMA and DeepSeek Models for Sentiment Analysis on Movie Reviews and Twitter Data

Articles

Published 2026-02-24

Seda Bayat Toksoz⁺⁻
Gultekin Isik⁺⁻

Seda Bayat Toksoz

Igdir University, Türkiye

https://orcid.org/0000-0002-8427-9971

Gultekin Isik

Igdir University, Türkiye

https://orcid.org/0000-0003-3037-5586

Pdf File

Keywords

Large language models
Sentiment analy- sis
QLoRA
Parameter- efficient fine- tuning
IMDB

How to Cite

Benchmarking QLoRA-Fine-Tuned LLaMA and DeepSeek Models for Sentiment Analysis on Movie Reviews and Twitter Data. (2026). Computational Systems and Artificial Intelligence, 2(1), 33-37. https://doi.org/10.69882/adba.csai.2026015

Abstract

Open-weight large language models (LLMs) such as LLaMA 2, LLaMA 3, and DeepSeek have quickly become attractive backbones for downstream NLP tasks, including sentiment analysis in both long-form reviews and short social media posts. However, full fine-tuning of these models remains computationally expensive and often impractical for academic research groups with limited hardware resources. This paper presents a comparative study of QLoRA-based sentiment adaptation for three open-weight LLM families, LLaMA 3, LLaMA 2, and DeepSeek, on two representative English benchmarks: the IMDB movie review dataset and a Twitter sentiment dataset. We apply a unified QLoRA pipeline that quantizes the backbone to 4-bit precision and trains low-rank adapters on top, enabling efficient fine-tuning on a single GPU. LLaMA 3 consistently achieves the best performance across both domains, reaching 91.2% accuracy and 0.908 F1 on IMDB and 85.6% accuracy and 0.849 F1 on Twitter. LLaMA 2 follows closely, while DeepSeek remains competitive but trails by 1–2 percentage points. Confusion matrix analysis reveals that all models struggle more with Twitter data due to its informal language and context-poor nature. Our findings provide practical guidance for practitioners choosing open LLM backbones for sentiment-related applications under compute constraints.

Pdf File

References

Barbieri, F. et al., 2020 Tweeteval: Unified benchmark and comparative evaluation for tweet classification. In Proceedings of EMNLP, pp. 1644–1650.

Bayat, S. and G. I¸sık, 2023a Assessing the efficacy of lstm, transformer, and rnn architectures in text summarization. In International Conference on Applied Engineering and Natural Sciences (ICAENS), pp. 1–8.

Bayat, S. and G. I¸sık, 2023b Evaluating the effectiveness of different machine learning approaches for sentiment classification. Journal of the Institute of Science and Technology 13: 1850–1862.

Bi, X. et al., 2024 Deepseek llm: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954.

Brown, T. et al., 2020 Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS).

Dettmers, T. et al., 2021 8-bit optimizers via block-wise quantization. arXiv preprint arXiv:2110.02861.

Dettmers, T. et al., 2023 Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.

Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova, 2019 Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pp. 4171–4186.

Ding, N. et al., 2023 Parameter-efficient fine-tuning of large-scale pre-trained language models: A survey. arXiv preprint arXiv:2303.15647.

Dubey, A. et al., 2024 The llama 3 herd of models. arXiv preprint arXiv:2407.21783.

Go, A., R. Bhayani, and L. Huang, 2009 Twitter sentiment classification using distant supervision. Technical report, Stanford University.

Houlsby, N. et al., 2019 Parameter-efficient transfer learning for nlp. In Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 2790–2799.

Hu, E. J. et al., 2021 Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.

Lester, B., R. Al-Rfou, and N. Constant, 2021 The power of scale for parameter-efficient prompt tuning. In Proceedings of EMNLP, pp. 3045–3059.

Liu, A. et al., 2024a Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437.

Liu, X. et al., 2024b Qlora bench: A benchmark for quantized low-rank adaptation. arXiv preprint.

Liu, Y. et al., 2019 Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Maas, A. L., R. E. Daly, P. T. Pham, D. Huang, and A. Y. Ng, 2011 Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 142–150.

OpenAI, 2023 Gpt-4 technical report. arXiv preprint arXiv:2303.08774.

Pfeiffer, J. et al., 2021 Adapterfusion: Non-destructive task composition for transfer learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 487–503.

Raffel, C. et al., 2020 Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21: 1–67.

Rosenthal, S. et al., 2017 Semeval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval), pp. 502–518.

Sandmann, S. et al., 2025 Open-source llm deepseek on a par with proprietary models in clinical reasoning. Nature Medicine.

Toksöz, S. B. and G. I¸sık, 2025a The Art of Efficiency in Large Language Models. Yaz Yayınevi.

Toksöz, S. B. and G. I¸sık, 2025b Efficient adaptation of large language models for sentiment analysis: A fine-tuning approach. Journal of the Institute of Science and Technology 15: 245–258.

Toksöz, S. B. and G. I¸sık, 2026 Parameter-efficient fine-tuning of llama models for financial sentiment classification. Cluster Computing 28: 1–15.

Touvron, H. et al., 2023 Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

Wei, J. et al., 2022 Emergent abilities of large language models. Transactions of the Association for Computational Linguistics 10: 542–556.

Wolf, T. et al., 2020 Transformers: State-of-the-art natural language processing. In Proceedings of EMNLP: System Demonstrations, pp. 38–45.

Zhang, Y. and J. Yang, 2022 Sentiment analysis with large language models: A survey. arXiv preprint arXiv:2212.10465.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Benchmarking QLoRA-Fine-Tuned LLaMA and DeepSeek Models for Sentiment Analysis on Movie Reviews and Twitter Data

Keywords

How to Cite

Download Citation

Abstract

References

Similar Articles