Reinforcement learning for finance: A review

Reinforcement learning for finance: A review

Main Article Content

Abstract

This paper provides a comprehensive review of the application of Reinforcement Learning (RL) in the domain of finance, shedding light on the groundbreaking progress achieved and the challenges that lie ahead. We explore how RL, a subfield of machine learning, has been instrumental in solving complex financial problems by enabling decision-making processes that optimize long-term rewards. Reinforcement learning (RL) is a powerful machine learning technique that can be used to train agents to make decisions in complex environments. In finance, RL has been used to solve a variety of problems, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising. In this paper, we review the recent developments in RL for finance. We begin by introducing RL and Markov decision processes (MDPs), which is the mathematical framework for RL. We then discuss the various RL algorithms that have been used in finance, with a focus on value-based and policy-based methods. We also discuss the use of neural networks in RL for finance. Finally, we discuss the results of recent studies that have used RL to solve financial problems. We conclude by discussing the challenges and opportunities for future research in RL for finance.

Keywords:

Downloads

Download data is not yet available.

Article Details

References (SEE)

Andreae, J. H. (1963). STELLA: A scheme for a learning machine. IFAC Proceedings Volumes, 1(2), 497-502. https://doi.org/10.1016/S1474-6670(17)69682-4 DOI: https://doi.org/10.1016/S1474-6670(17)69682-4

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine intelligence, 35(8), 1798-1828. https://doi.org/10.1109/TPAMI.2013.50 DOI: https://doi.org/10.1109/TPAMI.2013.50

Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271-1291. https://doi.org/10.1080/14697688.2019.1571683 DOI: https://doi.org/10.1080/14697688.2019.1571683

Camerer, C. F. (2003). Behavioural studies of strategic thinking in games. Trends in Cognitive Sciences, 7(5), 225-231. https://doi.org/10.1016/S1364-6613(03)00094-9 DOI: https://doi.org/10.1016/S1364-6613(03)00094-9

Cannelli, L., Nuti, G., Sala, M., & Szehr, O. (2020). Hedging using reinforcement learning: Contextual K-armed bandit versus Q-learning. Working paper, arXiv: 2007.01623.

Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10–27. https://doi.org/10.3905/jfds.2020.1.052 DOI: https://doi.org/10.3905/jfds.2020.1.052

Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). RL2: Fast reinforcement learning via slow reinforcement learning. Working paper, arXiv:1611.02779.

Errecalde, M. L., Muchut, A., Aguirre, G., & Montoya, C. I. (2000). Aprendizaje por Refuerzo aplicado a la resolución de problemas no triviales. In II Workshop de Investigadores en Ciencias de la Computación.

Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., … & Welty, C. (2010). Building Watson: An Overview of the DeepQA Project. AI Magazine, 31(3), 59-79. https://doi.org/10.1609/aimag.v31i3.2303 DOI: https://doi.org/10.1609/aimag.v31i3.2303

Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information processing systems, 29, 1-9.

Gosavi, A. (2009). Reinforcement learning: A tutorial survey and recent advances. INFORMS Journal on Computing, 21(2), 178-192. https://doi.org/10.1287/ijoc.1080.0305 DOI: https://doi.org/10.1287/ijoc.1080.0305

Hambly, B., Xu, R., & Yang, H. (2021). Recent advances in reinforcement learning in finance. arXiv preprint arXiv:2112.04553. https://arxiv.org/abs/2112.04553 DOI: https://doi.org/10.2139/ssrn.3971071

Halperin, I. (2019). The QLBS Q-learner goes NuQlear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9), 1543–1553. https://doi.org/10.1080/14697688.2019.1622302 DOI: https://doi.org/10.1080/14697688.2019.1622302

Halperin, I. (2020). QLBS: Q-learner in the Black-Scholes-Merton world. The Journal of Derivatives, 28(1), 99-122. https://doi.org/10.3905/jod.2020.1.108 DOI: https://doi.org/10.3905/jod.2020.1.108

Hu, Y. J., & Lin, S. J. (2019). Deep reinforcement learning for optimizing finance portfolio management. In 2019 Amity International Conference on Artificial Intelligence (AICAI) (pp. 14-20). IEEE. https://doi.org/10.1109/AICAI.2019.8701368 DOI: https://doi.org/10.1109/AICAI.2019.8701368

Kaelbling, L. P. (1993). Learning in embedded systems. MIT Press. DOI: https://doi.org/10.7551/mitpress/4168.001.0001

Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285. https://doi.org/10.1613/jair.301 DOI: https://doi.org/10.1613/jair.301

Kapoor, A., Gulli, A., Pal, S., & Chollet, F. (2022). Deep Learning with Tensor Flow and Keras: Build and deploy supervised, unsupervised, deep, and reinforcement learning models. Packt Publishing Ltd.

Kohl, N., & Stone, P. (2004, April). Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation, 2004. https://doi.org/10.1109/ROBOT.2004.1307456 DOI: https://doi.org/10.1109/ROBOT.2004.1307456

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep Learning. Nature, 521(7553), 436- 444. https://doi.org/10.1038/nature14539 DOI: https://doi.org/10.1038/nature14539

Li, Y., Szepesvari, C., & Schuurmans, D. (2009). Learning exercise policies for American options. In Artificial intelligence and statistics (pp. 352–359). PMLR. https://proceedings.mlr.press/v5/li09d.html

Michie, D. & Chambers, R. A. (1968). BOXES: An experiment in adaptive control. In E. Dale & D. Michie (eds.), Machine Intelligence. Oliver and Boyd.

Millea, A., & Edalat, A. (2022). Using deep reinforcement learning with hierarchical risk parity for portfolio optimization. International Journal of Financial Studies, 11(1), 10. https://doi.org/10.3390/ijfs11010010 DOI: https://doi.org/10.3390/ijfs11010010

Minsky, M. L. (1954). Theory of neural-analog reinforcement systems and its application to the brain-model problem. Princeton University.

Nath, S., Liu, V., Chan, A., Li, X., White, A., & White, M. (2020). Training recurrent neural networks online by learning explicit state variables. In International conference on learning representations.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489. https://doi.org/10.1038/nature16961 DOI: https://doi.org/10.1038/nature16961

Schlegel, M., Chung, W., Graves, D., Qian, J., & White, M. (2019). Importance resampling for off-policy prediction. Advances in Neural Information Processing Systems, 32.

Sun, Q., & Si, Y. W. (2022). Supervised actor-critic reinforcement learning with action feedback for algorithmic trading. Applied Intelligence, 53, 16875-16892. https://doi.org/10.1007/s10489-022-04322-5 DOI: https://doi.org/10.1007/s10489-022-04322-5

Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990 (pp. 216-224). https://doi.org/10.1016/B978-1-55860-141-3.50030-4 DOI: https://doi.org/10.1016/B978-1-55860-141-3.50030-4

Sutton, R. S. (1991). Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4), 160-163. https://doi.org/10.1145/122344.122377 DOI: https://doi.org/10.1145/122344.122377

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An introduction. MIT Press.

Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68. https://doi.org/10.1145/203330.203343 DOI: https://doi.org/10.1145/203330.203343

Théate, T., & Ernst, D. (2021). An application of deep reinforcement learning to algorithmic trading. Expert Systems with Applications, 173, 114632. https://doi.org/10.1016/j.eswa.2021.114632 DOI: https://doi.org/10.1016/j.eswa.2021.114632

Thrun, S. B., & Möller, K. (1991). Active exploration in dynamic environments. Advances in neural information processing systems, 4. https://proceedings.neurips.cc/paper/1991/hash/e5f6ad6ce374177eef023bf5d0c018b 6-Abstract.html

Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 1635-1685. https://doi.org/10.5555/1577069.1755839

Thorndike, E. L. (1911). Animal intelligence: Experimental studies. Transaction Publishers. DOI: https://doi.org/10.5962/bhl.title.1201

Torres Cortés, L. J., Velázquez Vadillo, F., & Turner Barragán, E. H. (2017). El principio de optimalidad de Bellman aplicado a la estructura financiera corporativa. Caso Mexicano. Análisis Económico, 32(81), 151-181.

Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence 2008.

Citado por