A Comparative Analysis of Deep Reinforcement Learning Approaches in Symbolic Optimization Tasks: The Case of DQN, QT-Opt and Samuel

Cem Özkurt; Ahmet  Kutey Küçükler; Murat  Karslıoğlu; Ruveyda Nur  Özdemir

doi:10.69882/adba.csai.2026013

Vol. 2 No. 1 (2026), Articles

Vol. 2 No. 1 (2026)

A Comparative Analysis of Deep Reinforcement Learning Approaches in Symbolic Optimization Tasks: The Case of DQN, QT-Opt and Samuel

Articles

Published 2026-02-24

Cem Özkurt⁺⁻
Ahmet Kutey Küçükler⁺⁻
Murat Karslıoğlu⁺⁻
Ruveyda Nur Özdemir⁺⁻

Cem Özkurt

Sakarya University of Applied Sciences, Turkiye

https://orcid.org/0000-0002-1251-7715

Ahmet Kutey Küçükler

Sakarya University of Applied Sciences, Sakarya, Turkiye

https://orcid.org/0009-0007-0202-016X

Murat Karslıoğlu

Sakarya University of Applied Sciences, Sakarya, Turkiye

https://orcid.org/0009-0002-2259-2994

Ruveyda Nur Özdemir

Sakarya University of Applied Sciences, Sakarya, Turkiye

https://orcid.org/0009-0007-9215-470X

Pdf File

Keywords

Reinforcement Learning
DQN
QT-Opt
Symbolic Optimization

How to Cite

A Comparative Analysis of Deep Reinforcement Learning Approaches in Symbolic Optimization Tasks: The Case of DQN, QT-Opt and Samuel. (2026). Computational Systems and Artificial Intelligence, 2(1), 15-20. https://doi.org/10.69882/adba.csai.2026013

Abstract

This study aims to comparatively analyze the performance of three reinforcement learning algorithms-DQN, QT-Opt, and Samuel’s checkers algorithm-on the symbolic matrix multiplication task. The experiments were conducted using a customized simulation environment, MatrixMultiplyDiscoveryEnv, where each agent generates outer product-based symbolic actions to perform matrix multiplication with minimal error and computational cost. The reward function incorporates the Frobenius norm, operation count, and symbolic complexity. Based on 50,000 episodes, the QT-Opt algorithm demonstrated a highly stable reward profile, maintaining reward values close to zero throughout training. Samuel’s algorithm showed rapid early learning, improving from -300 to around -100, but exhibited fluctuations in the later stages. In contrast, DQN’s reward varied drastically, occasionally falling below -3000, indicating instability and sensitivity to environmental uncertainty. Regarding matrix error (Frobenius norm), Samuel’s algorithm minimized its error to nearly zero in early training and maintained this performance. QT-Opt also performed well but showed occasional spikes in error. In terms of operation cost, QT-Opt consistently operated within 50-100 units, showing the highest efficiency. Samuel started with costs near 300, but reduced them gradually, converging towards QT-Opt’s performance. DQN, however, showed wide and erratic cost distributions. In conclusion, QT-Opt achieved the most stable and efficient learning, particularly in continuous action domains. This paper provides a unique perspective by comparing classical and modern reinforcement learning methods within a unified experimental framework, highlighting both their historical significance and practical performance.

Pdf File

References

Al-Hamadani, M. N., M. A. Fadhel, L. Alzubaidi, and B. Harangi, 2024. Reinforcement learning algorithms and applications in healthcare and robotics: A comprehensive and systematic review. Sensors 24: 2461.

AlMahamid, F. and K. Grolinger, 2025. Agile DQN: Adaptive deep recurrent attention reinforcement learning for autonomous UAV obstacle avoidance. Scientific Reports 15: 1–18.

Chen, C., J. Yu, and S. Qian, 2024. An enhanced deep Q-network algorithm for localized obstacle avoidance in indoor robot path planning. Applied Sciences 14: 11195.

de Sousa Bezerra, C. D., F. H. T. Vieira, and D. P. Q. Carneiro, 2023. Autonomous robotic navigation approach using deep Q-network late fusion and people detection-based collision avoidance. Applied Sciences 13: 12350.

Dong, Q., T. Kaneko, and M. Sugiyama, 2024. An offline learning of behavior correction policy for vision-based robotic manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation, pp. 5448–5454.

Fan, J., Z. Wang, Y. Xie, and Z. Yang, 2020. A theoretical analysis of deep Q-learning. In Proceedings of Learning for Dynamics and Control, pp. 486–489.

Gao, T., 2024. Optimizing robotic arm control using deep Q-learning and artificial neural networks through demonstration-based methodologies: A case study of dynamic and static conditions. Robotics and Autonomous Systems 181: 104771.

Gao, Y., J. Chen, X. Chen, C. Wang, J. Hu, et al., 2023. Asymmetric self-play-enabled intelligent heterogeneous multirobot catching system using deep multiagent reinforcement learning. IEEE Transactions on Robotics 39: 2603–2622.

Hester, T., M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, et al., 2018. Deep Q-learning from demonstrations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.

Jansonnie, P., B. Wu, J. Perez, and J. Peters, 2024. Unsupervised skill discovery for robotic manipulation through automatic task generation. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots, pp. 926–933.

Kalashnikov, D., A. Irpan, P. Pastor, J. Ibarz, A. Herzog, et al., 2018. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of the Conference on Robot Learning, pp. 651–673.

Liao, X., L. Li, C. Huang, X. Zhao, and S. Tan, 2024. Noisy dueling double deep Q-network algorithm for autonomous underwater vehicle path planning. Frontiers in Neurorobotics 18: 1466571.

Mao, J., T. Lozano-Pérez, J. B. Tenenbaum, and L. P. Kaelbling, 2023. Learning reusable manipulation strategies. In Proceedings of the Conference on Robot Learning, pp. 1467–1483.

Mnih, V., K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, et al., 2015. Human-level control through deep reinforcement learning. Nature 518: 529–533.

Pal, A., A. Chauhan, and M. Baranwal, 2025. Together we rise: Optimizing real-time multi-robot task allocation using coordinated heterogeneous plays. arXiv preprint arXiv:2502.16079.

Samuel, A. L., 1959. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3: 210–229.

Sutton, R. S. and A. G. Barto, 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA.

Wei, S., C. Li, J. Seyler, and S. Eivazi, 2023. Integration of efficient deep Q-network techniques into Qt-Opt reinforcement learning structure. In Proceedings of the International Conference on Agents and Artificial Intelligence, volume 3, pp. 592–599.

Wu, B. and C. S. Suh, 2024. Deep reinforcement learning for decentralized multi-robot control: A DQN approach to robustness and information integration. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, volume 88636, p. V005T07A035.

Zhang, H., S. Zeng, Y. Hou, H. Huang, and Z. Xu, 2025. Improved Qt-Opt algorithm for robotic arm grasping based on offline reinforcement learning. Machines 13: 451.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

A Comparative Analysis of Deep Reinforcement Learning Approaches in Symbolic Optimization Tasks: The Case of DQN, QT-Opt and Samuel

Keywords

How to Cite

Download Citation

Abstract

References

Similar Articles