This study presents a new framework of cloud-based multi-agent reinforcement learning an active dynamic portfolio optimization framework, which overcomes the inherent issues of adaptive asset allocation in changing market environment. The proposed architecture works with dedicated agents which are trained through Proximal Policy Optimization used to identify market regimes in real-time on the basis of which agent contributions are weighted by an attention-based meta-controller. The distributed cloud infrastructure provides the ability to perform simultaneously with experience collection and release asynchronous gradient updates and converges 87% faster than single agent baselines. Detailed analysis on empirical S&P 500 and global ETF indexes data over a series of market cycles indicates significant performance benefits: 21.4% annualized returns and Sharpe ratio of 1.57, or 35.3% better than the same idea using state-of-the-art single-agent deep-reinforcement learning algorithms and 118% compared to conventional mean-variance optimization. The model has strong risk control strength that has a maximum drawdown of 11.8% as opposed to the 24.3% in buy-and-hold models but has high returns in both bullish and high volatility bear markets. The important roles of multi-agent specialization, attention mechanisms and cloud-based scalability are authenticated by ablation studies. These results form a huge breakthrough that can be seen in autonomous portfolio management systems that can dynamically adjust to changing financial environments in real-time.
Shahzad Anwar “Reinforcement Learning-Driven Dynamic Investment Strategy Optimization Using Cloud- Vol. 12 Issue 11 PP. 207-233 November 2025. https://doi.org/10.5281/zenodo.17662007
[1]. S. Haykin, Neural Networks and Learning Machines, 3rd ed. Upper Saddle River, NJ: Pearson, 2009.
[2]. Cao, L. Lei, Y. Liu, Z. Chen, S. Shi, B. Li, W. Xu, and Z.-X. Yang, “Skeleton information-driven reinforcement learning framework for robust and natural motion of quadruped robots,” Symmetry, vol. 17, no. 11, p. 1787, Oct. 2025. DOI: 10.3390/sym17111787
[3]. Zhou, L. Dong, and Y. Wang, “Prediction and control of hovercraft cushion pressure based on deep reinforcement learning,” J. Mar. Sci. Eng., vol. 13, no. 11, p. 2058, Oct. 2025. DOI: 10.3390/jmse13112058
[4]. S. Li, M. López-Benítez, E. G. Lim, F. Ma, M. Cao, L. Yu, and X. Qin, “Enabling cooperative autonomy in UUV clusters: A survey of robust state estimation and information fusion techniques,” Drones, vol. 9, no. 11, p. 752, Oct. 2025. DOI: 10.3390/drones9110752
[5]. T. Dieguez and S. Gomes, “Bridging intention and action in sustainable university entrepreneurship: The role of motivation and institutional support,” Adm. Sci., vol. 15, no. 11, p. 422, Oct. 2025. DOI: 10.3390/admsci15110422
[6]. Nguyen, O. Mayet, and S. Desai, “Operational and supply chain growth trends in basic apparel distribution centers: A comprehensive review,” Logistics, vol. 9, no. 4, p. 154, Oct. 2025. DOI: 10.3390/logistics9040154
[7]. L. AlTerkawi and M. AlTarawneh, “Federated decision transformers for scalable reinforcement learning in smart city IoT systems,” Future Internet, vol. 17, no. 11, p. 492, Oct. 2025. DOI: 10.3390/fi17110492
[8]. Nucci and G. Papadia, “Hybrid genetic algorithm and deep reinforcement learning framework for IoT-enabled healthcare equipment maintenance scheduling,” Electronics, vol. 14, no. 21, p. 4160, Oct. 2025. DOI: 10.3390/electronics14214160
[9]. X. Hao, S. Wang, X. Liu, T. Wang, G. Qiu, and Z. Zeng, “Q-learning-based multi-strategy topology particle swarm optimization algorithm,” Algorithms, vol. 18, no. 11, p. 672, Oct. 2025. DOI: 10.3390/a18110672
[10]. Fulgione, S. Palladino, L. Esposito, S. Sarfarazi, and M. Modano, “A multi-stage framework combining experimental testing, numerical calibration, and AI surrogates for composite panel characterization,” Buildings, vol. 15, no. 21, p. 3900, Oct. 2025. DOI: 10.3390/buildings1521390