Call for paper | Submit Your Manuscript Online
Volume 2 - Issue 1, January - February 2026
📑 Paper Information
| 📑 Paper Title |
Reinforcement Learning-Based Deep Markov Models for Automated Trading: A Comparative Study of Q-Learning Variants in Optimal Execution |
| 👤 Authors |
Vinayak Chaware, Pushpak Sharma, Jayesh Shinde |
| 📘 Published Issue |
Volume 2 Issue 1 |
| 📅 Year of Publication |
2026 |
| 🆔 Unique Identification Number |
IJAMRED-V2I1P39 |
| 📑 Search on Google |
Click Here |
📝 Abstract
The optimal execution problem in algorithmic trading requires sophisticated models that can capture complex market dynamics while making sequential trading decisions under uncertainty. Traditional approaches often struggle to balance model complexity with data efficiency, particularly in high-frequency trading environments where price movements exhibit non-linear dependencies and regime-switching behavior. This paper presents a comprehensive study of Reinforcement Learning-Based Deep Markov Models (RLDMM) for automated trading, specifically addressing the optimal execution problem in limit order book markets. We develop and compare three algorithmic variants: standard Q-Learning, DynaQ-ARIMA, and DynaQLSTM, each designed to leverage different aspects of temporal market dynamics. The RL-DMM framework integrates the latent state representation capabilities of Deep Markov Models with the decision-making power of reinforcement learning, enabling the system to learn optimal trading policies from historical order book data. Our empirical evaluation uses real market data from the limit order books of four major securities: Facebook, Intel, Vodafone, and Microsoft, spanning multiple market conditions and volatility regimes. The experimental results demonstrate that the RLDMM framework achieves superior data efficiency compared to baseline approaches, requiring significantly fewer training samples to converge to profitable policies. Furthermore, the model delivers substantial financial gains across all tested securities, with performance improvements becoming increasingly pronounced in markets exhibiting complex price dynamics and high volatility. The DynaQ-LSTM variant demonstrates particular strength in capturing long-range temporal dependencies, achieving an average improvement of 18.3% in execution quality over standard Q-Learning baselines. These findings establish the RLDMM framework as a robust and practical solution for real-world algorithmic trading applications, offering a principled approach to the optimal execution problem that balances theoretical rigor with empirical performance.
📝 How to Cite
Vinayak Chaware, Pushpak Sharma, Jayesh Shinde,"Reinforcement Learning-Based Deep Markov Models for Automated Trading: A Comparative Study of Q-Learning Variants in Optimal Execution" International Journal of Scientific Research and Engineering Development, V2(1): Page(247-254) Jan-Feb 2026. ISSN: 3107-6513. www.ijamred.com. Published by Scientific and Academic Research Publishing.