MLOps Blog

How to Structure, Organize, Track and Manage Reinforcement Learning (RL) Projects

Vladimir Lyashenko

6 min

24th August, 2023

MLOps Reinforcement Learning

Structuring and managing machine learning projects can be a tricky thing.

When you dive into a project, you may quickly realize that you’re drowning in an ocean of Python scripts, data, algorithms, functions, updates, and so on. At some point, you just lose track of your experiments, and can’t even say which script or update led to the best result.

So, structuring your project and keeping track of experiments is a crucial part of success.

From this point of view, working on an ML project might be challenging in general, but some fields are more complicated than others. Reinforcement Learning (RL) is one of the complicated ones.

This article is dedicated to structuring and managing RL projects. I’ll try to be as precise as possible and provide a comprehensive step-by-step guide and some useful tips.

We’ll cover:

General tips – project directory structure, Cookiecutter, keeping track of experiments using Neptune, proper evaluation
Defining a problem as an RL problem – Reinforcement Learning, Supervised Learning, optimization problem, maximization and minimization
Picking an RL environment – OpenAI Gym
Picking an RL library and algorithm – RL_Coach, Tensorforce, Stable Baselines, RL_Coach guidelines
Testing the performance of the agent
Preparing for publishing – README, requirements, readable code, visualizations

Let’s jump in.

General tips

To begin, you must return to the basics and remember tips that can be applied to any ML project. These are:

Project directory structure
Keeping track of experiments
Proper model evaluation

Project directory structure

The ability to keep the working directory structured and easy to skim through is a great skill in general.

When talking about a data science workflow, we face multiple elements such as:

Data
Models
Logs
Training and testing scripts
Hyperparameter files
Other

There are various practices and approaches for structuring your working directory. From my personal experience the best, fastest, and easiest one is using the Cookiecutter template.

Cookiecutter is a powerful tool with its own philosophy. It provides complete documentation that’s easy to navigate through.

Try using the Cookiecutter template to structure your working directory for your next RL project, and you’ll see how convenient it is.

Keeping track of experiments

The most important general tip from my perspective is keeping track of experiments. There are many key values you may want to track:

Hyperparameters
Inference time
Gains over baseline
Any comments (for example, the text description of an experiment)
Other

Even if your directory is poorly structured, proper experiment tracking is a must-have feature. It will save you from losing the working pace and valuable results.

For Reinforcement Learning, experiment tracking is not a challenging task thanks to Neptune, which can be used with most RL libraries.

There are other tracking tools but they tend to not work with RL projects very well. Keep that in mind when choosing your RL library, and always double-check if you can use your favorite tracking tool with a particular library.

Nonetheless, Neptune is a stable and great tool for keeping track of your experiments. It has plenty of valuable examples and tutorials and can be used both for your own projects, as well as team projects.

You should definitely try it out when working on your next RL project.

Proper evaluation

Last but not least, you need to be both careful and precise when evaluating the performance of your algorithm. In the Reinforcement Learning field, evaluation metrics and processes depend a lot on your problem and the environment that you are using.

I suggest checking Kaggle competitions and forums first, as you’re likely to find similar problems to yours, along with valuable ideas and advice. Never underestimate the power of the community, and always check if there’s something new and interesting out there.

Keep in mind that, when working on an RL project, you should always check the video of your agent performing if possible. Of course, it’s not a proper evaluation or experiment tracking technique, but it’s a nice addition to other tools and metrics.

We went through the fundamentals, so let’s move on to all the major steps of working on an RL problem:

Define a problem as an RL problem
Pick an RL environment
Pick an RL library and algorithm
Test the performance of the agent
Prepare your project for publishing

Defining a problem as an RL problem

First, we need to decide if Reinforcement Learning suits our problem well. It’s a really important step, as we don’t want to overcomplicate the task by using an unsuitable learning model or irrelevant algorithms.

RL is all about exploration and exploitation, and the tradeoff between them. It’s the main difference between RL and a lot of other types of learning, such as Supervised Learning.

RL agents learn by interacting with an environment, trying different actions, and receiving different reward values for those actions while aiming to maximize the overall reward at the end.

It’s a completely different concept from Supervised Learning, where agents learn by comparing their predictions with existing labels and updating their strategies afterward.

That is why you need to make sure whether RL can be used to solve your problem or not.

Luckily, it’s quite easy to do.

Think if your task is an optimization problem. Next, you must figure out if there is any metric that you want your RL agent to learn to maximize or minimize.

If both of your answers are yes, then Reinforcement Learning might be a good fit for the problem, and you should start thinking of an RL environment.

Picking an RL environment

If Reinforcement Learning suits your problem well, it’s time to pick or build the infrastructure for running RL algorithms.

This infrastructure is called an environment. It’s used to train RL agents. Basically, an environment is a simulation that imitates the real environment where the agent will be deployed.

There are plenty of different RL environments out there:

That is why, if you don’t want to build your own environment, I suggest using the most popular and commonly used one – OpenAI Gym.

If you do want to build your own environment, you will face a series of challenges. You will need to think of:

Environment structure – what type of environment you want to build
Environment interface – an interface that connects the environment to the RL algorithm
Testing the environment – to make sure that your implementation works perfectly and your RL agent will learn correctly

To tell the truth, these are not super obvious things to work on. That is why I suggest grinding the topic using valuable articles and posts and video tutorials. Hopefully, it’ll be enough for you to set up your own RL environment.

However, please remember that you should not overcomplicate the task, so set up your own environment only if necessary. There’s no shame in using a pre-built one.

Picking an RL library and algorithm

At this point, you have to choose both an RL library and the algorithm that you’ll use to solve the problem.

There are multiple RL libraries out there, and picking the right one is a crucial part of your project’s success. I recommend reading “The Best Tools for Reinforcement Learning in Python You Actually Want to Try”. This article will help you with your choice.

Overall, I strongly recommend Tensorforce, Stable Baselines, or RL_Coach. They seem up-to-date, have a great set of algorithms implemented, and provide valuable tutorials as well as complete documentation. Also, they work with multiple environments, so setting things up should not be a problem.

As for the choice of an RL algorithm, I feel that by this point you’ve already plunged into the task and chosen the algorithm. If so, that’s great, and you should start training your agent.

If not, please check the RL_Coach documentation. In my opinion, it has a perfect guideline of how to choose the right algorithm for your task. It should help you a lot if you have this problem.

Best Reinforcement Learning Tutorials, Examples, Projects, and Courses
10 Real-Life Applications of Reinforcement Learning

Testing the performance of the agent

Now, when your RL agent is trained, it’s time to evaluate it. As I mentioned before, it might be a tricky process that depends on your problem and the environment that you’re using.

Still, there are some general tips that I want to mention here.

If your goal is optimal control, you should use some aggregate measure of reward. For example, total reward per episode, or mean reward per time step. This will help you to figure out how well the agent does at the task.

If you’re working with a videogame problem, or one designed so that it’s easy to identify, use a maximum bound on the reward-based measure. After that, you can compare your agent against this known value. It’s reasonable to expect that a good agent will approach the maximum value.

In practice, many interesting problems don’t have a known upper bound on reward totals or averages. For those problems, typically the best you can do is compare between agents. You can compare with:

A randomly acting agent – this would normally be just as a baseline, to show that the agent had learned something
An automated agent – an agent that’s using a simple action choice heuristic, which might be something natural or obvious in the given problem
One or more humans on the same task
Other ML-trained agents including previous instances of the same agent

If either the policy or environment is stochastic, you will probably want to run multiple tests and average the results, in order to assess an agent with expected values as much as possible.

It’s also very important to switch off any exploration during tests, to get a fair measure of how well the trained agent behaves if you’re using any off-policy techniques, such as DQN.

If your agent is designed to continually learn and explore, and/or uses an on-policy approach, you can use results during training to assess it. For example, you can take a rolling average of total reward over the last N episodes or something similar.

Moreover, this is not a bad metric to monitor training, even for off-policy approaches. Although for off-policy, you will likely get an underestimate of performance compared to separate test runs.

There are other approaches and other metrics to assess an agent. For example, how much experience, or how much computation the agent needs to learn to a certain level is often of interest.

If you want to conclude that the agent is trained well or badly for an optimal control task, this assessment of total reward might be all you need.

However, you can also look at loss metrics inside any neural network – you wouldn’t do this in order to rank agents as better or worse, but you may do it to identify problems.

These loss metrics are generally the same as Supervised Learning equivalents. For example, in DQN, or for the critic part of PPO, you’d be interested in whether the predicted value of any state matched the eventual value, and use MSE loss.

How to Make Sense of the Reinforcement Learning Agents? What and Why I Log During Training and Debug

Preparing for publishing

This is the final step, where you can be creative and show your personality.

Still, please remember that if you plan to publish your project to the world, for example on Github, you might want to follow a few simple rules:

README – please have a good README file in your repository, it will help those who are not familiar with your project to set it up
requirements.txt – a file with libraries and library versions that you used to make this project
Readable code that’s easy to customize – it will be really good both for you and for potential users
Some valuable visualizations – in case of an RL project, this can be a gif of your agent working

It’s worth following these rules because, in the Data Science community, they are considered common decency.

And that’s it, your project is done. Congratulations!

Final thoughts

I hope you found new ideas for acing your next Reinforcement Learning project.

To summarize, we started with some general tips for structuring and managing any ML project, and went through a step-by-step guide on how to structure your work in Reinforcement Learning projects. Lastly, we covered some ideas for project publishing.

If you enjoyed this post, a great next step would be to start building your own RL project structure with all the relevant tools. Check out tools like:

Neptune for experiment tracking,
RL_Coach, Tensoforce, and Stable Baselines as RL libraries,
OpenAI Gym as an RL environment,
Cookiecutter for the project directory structure.

Thanks for reading, and happy training!

Resources

Was the article useful?

Thank you for your feedback!

What topics would you like to see for your next read

Let us know what should be improved

Thanks! Your suggestions have been forwarded to our editors

More about How to Structure, Organize, Track and Manage Reinforcement Learning (RL) Projects

Check out our product resources and related articles below:

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Product Updates Reinforcement Learning Tabular Data Time Series

Neptune is the MLOps stack component for experiment tracking.

It offers a single place to track, compare, store, and collaborate on experiments and models.

Take interactive tour of the Neptune app

See Docs

Explore resources

Check pricing

See also: neptune.ai demo [20min]

How Brainly avoids workflow bottlenecks with automated tracking

How Neptune gave Waabi organization-wide visibility on experiment data

How Elevatus uses Neptune to check experiment results in under 1 minute

Building a Machine Learning Platform

Learnings From Building the ML Platform at Mailchimp