We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

7 Applications of Reinforcement Learning in Finance and Trading

In this article, we will explore 7 real world trading and finance applications where reinforcement learning is used to get a performance boost.

Ok but before we move on to the nitty gritty of this article let’s define a few concepts that I will use later. 

For starters let’s quickly define reinforcement learning:

A learning process in which an agent interacts with its environment through trial and error, to reach a defined goal in such a way that the agent can maximize the number of rewards, and minimize the penalties given by the environment for each correct step made by the agent to reach its goal.

Cool, now a few keywords that I will use a lot:

  1. Deep Reinforcement Learning (DRL): Algorithms that employ deep learning to approximate value or policy functions that are at the core of reinforcement learning.
  2. Policy Gradient Reinforcement Learning Technique: Approach used in solving reinforcement learning problems. Policy gradient methods target modeling and optimizing the policy function directly. 
  3. Deep Q Learning: Using a neural network to approximate the Q-value function. The Q-value function creates an exact matrix for the working agent, which it can “refer to” to maximize its reward in the long run.
  4. Gated Recurrent Unit (GRU): Special type of Recurrent Neural Network, implemented with the help of a gating mechanism.
  5. Gated Deep Q Learning strategy: Combination of Deep Q Learning with GRU.
  6. Gated Policy Gradient strategy: Combination of Policy gradient technique with GRU.
  7. Deep Recurrent Q Network: Combination of Recurrent Neural networks with the Q Learning technique.

OK, now we’re ready to check out how reinforcement learning is used to maximize profits in the finance world.

1. Trading bots with Reinforcement Learning

Bots powered with reinforcement learning can learn from the trading and stock market environment by interacting with it. They use trial and error to optimize their learning strategy based on the characteristics of each and every stock listed in the stock market.

trading bots
Image by Manfred Steger | Source: Pixabay

There are a few big advantages to this approach:

  • saves time
  • trading bots can trade on a 24hrs timeline basis
  • trading gets diversified across all industries

As an example, you can check out the Stock Trading Bot using Deep Q-Learning project. The idea here was to create a trading bot using the Deep Q Learning technique, and tests show that a trained bot is capable of buying or selling at a single piece of time given a set of stocks to trade on.

Please note that this project is not based on counting transactional costs, efficiency of executing trades, etc. – so this project can’t be outstanding in the real world. Plus, training of the project is done on CPU due to its sequential manner.

2. Chatbot-based Reinforcement Learning

Chatbots are generally trained with the help of sequence to sequence modelling, but adding reinforcement learning to the mix can have big advantages for stock trading and finance:

  • Chatbots can act as brokers and offer real-time quotes to their user operators.
  • Conversational UI-based chatbots can help customers resolve their issues instead of someone from the staff or from the backend support team. This saves time, and relieves the support staff from repeatable tasks, letting them concentrate on more complicated issues.
  • Chatbots can also give suggestions on opening and closing sales values within trading hours.

The Deep Reinforcement Learning Chatbot project shows a chatbot implementation based on reinforcement learning, achieved with the Policy gradient technique. 

3. Risk optimization in peer-to-peer lending with Reinforcement Learning

P2P lending is a way of providing individuals and businesses with loans through online services. These online services do the job of matching lenders to their investors.

In these types of online marketplaces, reinforcement learning comes in handy. Specifically it can be used to:

  • Analyze borrowers’ credit scores to reduce risk.
  • Predicting annualized returns, since online businesses have low overhead, lenders can expect higher returns compared to savings and investment products offered by banks.
  • It can also help estimate the likelihood if the borrower will be able to meet his/her debt obligations.

The Peer-to-Peer Lending Robo-Advisor Using a Neural Network project is an online lending platform built with a Neural Network. It doesn’t use reinforcement learning, but you can see that it’s just the kind of trial & error scenario where RL would make perfect sense.

4. Portfolio Management with Deep Reinforcement Learning

Portfolio Management means taking your client’s assets, putting it into stocks, and managing it on a continuous basis to help the client achieve their financial goals. With the help of Deep Policy Network Reinforcement Learning, the allocation of assets can be optimized over time. 

In this case, the benefits of deep reinforcement learning are:

  • It enhances the efficiency and success rates of human managers.
  • It decreases organizational risk.
  • It increases Return on Investments (ROI) in terms of organizational profit. 

Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem – this project shows an implementation of portfolio management with Deep Policy Network Reinforcement Learning. 

5. Price setting strategies with Reinforcement Learning

Complexity and dynamic stock price changes are the biggest challenges in understanding stock prices. In order to understand these properties, Gated Recurrent Unit (GRU) networks work well with reinforcement learning, providing advantages such as:

  • Extracting informative financial features which can represent the intrinsic character of a stock.
  • Helping to decide the stop loss and stop profit during trading.
RL price setting
Photo by Olya Kobruseva | Source: Pexels

To support the above statements, the Deep reinforcement learning for time series: playing idealized trading games paper shows which performs best out of Stacked Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM) units, Convolutional Neural Network (CNN), and Multi-Layer Perceptron (MLP). 

The GRU-based agents used to model Q values show the best overall performance in the Univariate game to capture a wave-like price time series.

The two techniques with which reinforcement learning can be applied with GRU are:

  • Gated Deep Q Learning Strategy
  • Gated Policy Gradient Strategy

To understand these techniques better, you can check out this article: Adaptive stock trading strategies with deep reinforcement learning methods.

6. Recommendation systems with Reinforcement Learning

When it comes to online trading platforms, recommendation systems based on reinforcement learning techniques can be a gamechanger. These systems can help in recommending the right stocks to users while trading.

RL recommendation systems
Photo by ThisIsEngineering | Source: Pexels

Reinforcement learning helps to choose the best stock or mutual fund after being trained on a number of stocks, ultimately leading to better ROI.

The advantages here can be:

  • Engaging existing users by providing lifelong stock-picking recommendations based on the users’ behaviour on the platform.
  • Helping beginners by suggesting good stocks to trade.
  • Making it easier to decide which stocks to pick.

The StockRecommendSystem project shows an implementation of a system like this.

7. Maximizing profit with minimum capital investments

If we combine all of the above points, we could get an automated system constructed to achieve high returns, while keeping the investments as low as possible.

RL maximizing profit
Photo by Karolina Grabowska | Source: Pexels

An agent can be trained with the help of reinforcement learning, which can take the minimum asset from any source and allocate it to a stock, which can double the ROI in the future.

Nowadays, RL agents have been able to learn optimal trading strategies that outperform simple buy and sell strategies that people used to apply. This can be achieved with the help of the Markov Decision Process (MDP) model, using Deep Recurrent Q Network (DRQN). A good resource to understand this concept is Deep Recurrent Q-Learning for Partially Observable MDPs.

Proceed with caution

It’s important to add that a lot of the projects we listed are essentially projects made for fun. They’re trained on past data and not backtested properly. In the case of unseen data (for example COVID stats), the downside risk is much larger than expected by the model.

The market is a complicated system and it’s hard for machine learning systems to understand stocks based only on historical data. The performance of ML-based trading strategies can be great, but it can also cause you to drain your savings. So take these projects with a grain of salt.


Reinforcement learning has always been kind of underrated. By showing finance and trading use cases of RL in this article, I want to share awareness about how useful RL can be, creating a motivated path for new learners and existing developers to explore this domain more. It’s a fascinating topic!

Data Science enthusiast, SDET Python


10 Real-Life Applications of Reinforcement Learning

10 mins read | Author Derrick Mwiti | Updated May 25th, 2021

In Reinforcement Learning (RL), agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimize wrong moves and maximize the right ones. 

In this article, we’ll look at some of the real-world applications of reinforcement learning.

Applications in self-driving cars

Various papers have proposed Deep Reinforcement Learning for autonomous driving. In self-driving cars, there are various aspects to consider, such as speed limits at various places, drivable zones, avoiding collisions — just to mention a few. 

Some of the autonomous driving tasks where reinforcement learning could be applied include trajectory optimization, motion planning, dynamic pathing, controller optimization, and scenario-based learning policies for highways. 

For example, parking can be achieved by learning automatic parking policies. Lane changing can be achieved using Q-Learning while overtaking can be implemented by learning an overtaking policy while avoiding collision and maintaining a steady speed thereafter.

AWS DeepRacer is an autonomous racing car that has been designed to test out RL in a physical track. It uses cameras to visualize the runway and a reinforcement learning model to control the throttle and direction.

Continue reading ->
Experiment tracking Experiment management

15 Best Tools for ML Experiment Tracking and Management

Read more
How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug

How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug

Read more
Markov Decision Process in Reinforcement Learning: Everything You Need to Know

Markov Decision Process in Reinforcement Learning: Everything You Need to Know

Read more
Logging in Reinforcement Learning Frameworks - What You Need to Know

Logging in Reinforcement Learning Frameworks – What You Need to Know

Read more