Neptune Blog

The Best Tools for Reinforcement Learning in Python You Actually Want to Try

Vladimir Lyashenko , Piotr Januszewski

13 min

6th May, 2025

ML Tools Reinforcement Learning

Nowadays, Deep Reinforcement Learning (RL) is one of the hottest topics in the Data Science community. The fast development of RL has resulted in the growing demand for easy to understand and convenient to use RL tools.

In recent years, plenty of RL libraries have been developed. These libraries were designed to have all the necessary tools to both implement and test Reinforcement Learning models.

Still, they differ quite a lot. That’s why it is important to pick a library that will be quick, reliable, and relevant for your RL task.

In this article we will cover:

Criteria for choosing Deep Reinforcement Learning library,
RL libraries: Pyqlearning, KerasRL, Tensorforce, RL_Coach, TFAgents, MAME RL, MushroomRL.

Python libraries for Reinforcement Learning

There are a lot of RL libraries, so choosing the right one for your case might be a complicated task. We need to form criteria to evaluate each library.

Criteria

Each RL library in this article will be analyzed based on the following criteria:

Number of state-of-the-art (SOTA) RL algorithms implemented – the most important one in my opinion
Official documentation, availability of simple tutorials and examples
Readable code that is easy to customize
Number of supported environments – a crucial decision factor for Reinforcement Learning library
Logging and tracking tools support – for example, Neptune or TensorBoard
Vectorized environment (VE) feature – method to do multiprocess training. Using parallel environments, your agent will experience way more situations than with one environment
Regular updates – RL develops quite rapidly and you want to use up-to-date technologies

We will talk about the following libraries:

KerasRL

KerasRL is a Deep Reinforcement Learning Python library. It implements some state-of-the-art RL algorithms, and seamlessly integrates with Deep Learning library Keras.

Moreover, KerasRL works with OpenAI Gym out of the box. This means you can evaluate and play around with different algorithms quite easily.

To install KerasRL simply use a pip command:

pip install keras-rl

Let’s see if KerasRL fits the criteria:

Number of SOTA RL algorithms implemented

As of today KerasRL has the following algorithms implemented:

Deep Q-Learning (DQN) and its improvements (Double and Dueling)
Deep Deterministic Policy Gradient (DDPG)
Continuous DQN (CDQN or NAF)
Cross-Entropy Method (CEM)
Deep SARSA

As you may have noticed, KerasRL misses two important agents: Actor-Critic Methods and Proximal Policy Optimization (PPO).

Official documentation, availability of tutorials and examples

The code is easy to read and it’s full of comments, which is quite useful. Still, the documentation seems incomplete as it misses the explanation of parameters and tutorials. Also, practical examples leave much to be desired.

Readable code that is easy to customize

Very easy. All you need to do is to create a new agent following the example and then add it to rl.agents.

Number of supported environments

KerasRL was made to work only with OpenAI Gym. Therefore you need to modify the agent if you want to use any other environment.

Logging and tracking tools support

Logging and tracking tools support is not implemented. Nevertheless, you can use neptune.ai to track your experiments.

Vectorized environment feature

Includes a vectorized environment feature.

Regular updates

The library seems not to be maintained anymore as the last updates were more than a year ago.

To sum up, KerasRL has a good set of implementations. Unfortunately, it misses valuable points such as visualization tools, new architectures and updates. You should probably use another library.

Pyqlearning

Pyqlearning is a Python library to implement RL. It focuses on Q-Learning and multi-agent Deep Q-Network.

Pyqlearning provides components for designers, not for end user state-of-the-art black boxes. Thus, this library is a tough one to use. You can use it to design the information search algorithm, for example, GameAI or web crawlers.

To install Pyqlearning simply use a pip command:

pip install pyqlearning

Let’s see if Pyqlearning fits the criteria:

Number of SOTA RL algorithms implemented

As of today Pyqlearning has the following algorithms implemented:

Deep Q-Learning (DQN) and its improvements (Epsilon Greedy and Boltzmann)

As you may have noticed, Pyqlearning has only one important agent. The library leaves much to be desired.

Official documentation, availability of tutorials and examples

Pyqlearning has a couple of examples for various tasks and two tutorials featuring Maze Solving and the pursuit-evasion game by Deep Q-Network. You may find them in the official documentation. The documentation seems incomplete as it focuses on the math, and not the library’s description and usage.

Readable code that is easy to customize

Pyqlearning is an open-source library. Source code can be found on Github. The code lacks comments. It may be a complicated task to customize it. Still, the tutorials might help.

Number of supported environments

Since the library is agnostic, it’s relatively easy to add to any environment.

Logging and tracking tools support

The author uses a simple logging package in the tutorials. Pyqlearning does not support other logging and tracking tools, for example, TensorBoard.

Vectorized environment feature

Pyqlearning does not support Vectorized environment feature.

Regular updates

The library is maintained. The last update was made two months ago. Still, the development process seems to be a slow-going one.

To sum up, Pyqlearning leaves much to be desired. It is not a library that you will use commonly. Thus, you should probably use something else.

Tensorforce

Tensorforce is an open-source Deep RL library built on Google’s Tensorflow framework. It’s straightforward in its usage and has a potential to be one of the best Reinforcement Learning libraries.

Tensorforce has key design choices that differentiate it from other RL libraries:

Modular component-based design: Feature implementations, above all, tend to be as generally applicable and configurable as possible.
Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.

To install Tensorforce simply use a pip command:

pip install tensorforce

Let’s see if Tensorforce fits the criteria:

Number of SOTA RL algorithms implemented

As of today, Tensorforce has the following set of algorithms implemented:

Deep Q-Learning (DQN) and its improvements (Double and Dueling)
Vanilla Policy Gradient (PG)
Deep Deterministic Policy Gradient (DDPG)
Continuous DQN (CDQN or NAF)
Actor Critic (A2C and A3C)
Trust Region Policy Optimization (TRPO)
Proximal Policy Optimization (PPO)

As you may have noticed, Tensorforce misses the Soft Actor Critic (SAC) implementation. Besides that it is perfect.

Official documentation, availability of tutorials and examples

It is quite easy to start using Tensorforce thanks to the variety of simple examples and tutorials. The official documentation seems complete and convenient to navigate through.

Readable code that is easy to customize

Tensorforce benefits from its modular design. Each part of the architecture, for example, networks, models, runners is distinct. Thus, you can easily modify them. However, the code lacks comments and that could be a problem.

Number of supported environments

Tensorforce works with multiple environments, for example, OpenAI Gym, OpenAI Retro and DeepMind Lab. It also has documentation to help you plug into other environments.

Logging and tracking tools support

The library supports TensorBoard and other logging/tracking tools.

Vectorized environment feature

Tensorforce supports Vectorized environment feature.

Regular updates

Tensorforce is regularly updated. The last update was just a few weeks ago.

To sum up, Tensorforce is a powerful RL tool. It is up-to-date and has all necessary documentation for you to start working with it.

RL_Coach

Reinforcement Learning Coach (Coach) by Intel AI Lab is a Python RL framework containing many state-of-the-art algorithms.

It exposes a set of easy-to-use APIs for experimenting with new RL algorithms. The components of the library, for example, algorithms, environments, neural network architectures are modular. Thus, extending and reusing existent components is fairly painless.

To install Coach simply use a pip command.

pip install rl_coach

Still, you should check the official installation tutorial as a few prerequisites are required.

Let’s see if Coach fits the criteria:

Number of SOTA RL algorithms implemented

As of today, RL_Coach has the following set of algorithms implemented:

As you may have noticed, RL_Coach has a variety of algorithms. It’s the most complete library of all covered in this article.

Official documentation, availability of tutorials and examples

The documentation is complete. Also, RL_Coach has a set of valuable tutorials. It will be easy for newcomers to start working with it.

Readable code that is easy to customize

RL_Coach is the open-source library. It benefits from the modular design, but the code lacks comments. It may be a complicated task to customize it.

Number of supported environments

Coach supports the following environments:

OpenAI Gym
ViZDoom
Roboschool
GymExtensions
PyBullet
CARLA
And other

For more information including installation and usage instructions please refer to official documentation.

Logging and tracking tools support

Coach supports various logging and tracking tools. It even has its own visualization dashboard.

Vectorized environment feature

RL_Coach supports Vectorized environment feature. For usage instructions please refer to the documentation.

Regular updates

The library seems to be maintained. However, the last major update was almost a year ago.

To sum up, RL_Coach has a perfect up-to-date set of algorithms implemented. And it’s newcomer friendly. I would strongly recommend Coach.

TFAgents

TFAgents is a Python library designed to make implementing, deploying, and testing RL algorithms easier. It has a modular structure and provides well-tested components that can be easily modified and extended.

TFAgents is currently under active development, but even the current set of components makes it the most promising RL library.

To install TFAgents simply use a pip command:

pip install tf-agents

Let’s see if TFAgents fits the criteria:

Number of SOTA RL algorithms implemented

As of today, TFAgents has the following set of algorithms implemented:

Deep Q-Learning (DQN) and its improvements (Double)
Deep Deterministic Policy Gradient (DDPG)
TD3
REINFORCE
Proximal Policy Optimization (PPO)
Soft Actor Critic (SAC)

Overall, TFAgents has a great set of algorithms implemented.

Official documentation, availability of tutorials and examples

TFAgents has a series of tutorials on each major component. Still, the official documentation seems incomplete, I would even say there is none. However, the tutorials and simple examples do their job, but the lack of well-written documentation is a major disadvantage.

Readable code that is easy to customize

The code is full of comments and the implementations are very clean. TFAgents seems to have the best library code.

Number of supported environments

The library is agnostic. That is why it’s easy to plug it into any environment.

Logging and tracking tools support

Logging and tracking tools are supported.

Vectorized environment feature

Vectorized environment is supported.

Regular updates

As mentioned above, TFAgents is currently under active development. The last update was made just a couple of days ago.

To sum up, TFAgents is a very promising library. It already has all necessary tools to start working with it. I wonder what it will look like when the development is over.

Stable Baselines

Stable Baselines is a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI Baselines. The OpenAI Baselines library was not good. That’s why Stable Baselines was created.

Stable Baselines features unified structure for all algorithms, a visualization tool and excellent documentation.

To install Stable Baselines simply use a pip command.

pip install story-baselines

Still, you should check the official installation tutorial as a few prerequisites are required.

Let’s see if Stable Baselines fits the criteria:

Number of SOTA RL algorithms implemented

As of today, Stable Baselines has the following set of algorithms implemented:

A2C
ACER
ACKTR
DDPG
DQN
HER
GAIL
PPO1 and PPO2
SAC
TD3
TRPO

Overall, Stable Baselines has a great set of algorithms implemented.

Official documentation, availability of tutorials and examples

The documentation is complete and excellent. The set of tutorials and examples is also really helpful.

Readable code that is easy to customize

On the other hand, modifying the code can be tricky. But because Stable Baselines provides a lot of useful comments in the code and awesome documentation, the modification process will be less complex.

Number of supported environments

Stable Baselines provides good documentation about how to plug into your custom environment, however, you need to do it using OpenAI Gym.

Logging and tracking tools support

Stable Baselines has the TensorBoard support implemented.

Vectorized environment feature

Vectorized environment feature is supported by a majority of the algorithms. Please check the documentation in case you want to learn more.

Regular updates

The last major updates were made almost two years ago, but the library is maintained as the documentation is regularly updated.

To sum up, Stable Baselines is a library with a great set of algorithms and awesome documentation. You should consider using it as your RL tool.

MushroomRL

MushroomRL is a Python Reinforcement Learning library whose modularity allows you to use well-known Python libraries for tensor computation and RL benchmarks.

It enables RL experiments providing classical RL algorithms and deep RL algorithms. The idea behind MushroomRL consists of offering the majority of RL algorithms, providing a common interface in order to run them without doing too much work.

To install MushroomRL simply use a pip command.

pip install mushroom_rl

Let’s see if MushroomRL fits the criteria:

Number of SOTA RL algorithms implemented

As of today, MushroomRL has the following set of algorithms implemented:

Q-Learning
SARSA
FQI
DQN
DDPG
SAC
TD3
TRPO
PPO

Overall, MushroomRL has everything you need to work on RL tasks.

Official documentation, availability of tutorials and examples

The official documentation seems incomplete. It misses valuable tutorials, and simple examples leave much to be desired.

Readable code that is easy to customize

The code lacks comments and parameter description. It’s really hard to customize it. Although MushroomRL never positioned itself as a library that is easy to customize.

Number of supported environments

MushroomRL supports the following environments:

OpenAI Gym
DeepMind Control Suite
MuJoCo

For more information including installation and usage instructions please refer to official documentation.

Logging and tracking tools support

MushroomRL supports various logging and tracking tools. I would recommend using TensorBoard as the most popular one.

Vectorized environment feature

Vectorized environment feature is supported.

Regular updates

The library is maintained. The last updates were made just a few weeks ago.

To sum up, MushroomRL has a good set of algorithms implemented. Still, it misses tutorials and examples which are crucial when you start to work with a new library.

RLlib

“RLlib is an open-source library for reinforcement learning that offers both high scalability and a unified API for a variety of applications. RLlib natively supports TensorFlow, TensorFlow Eager, and PyTorch, but most of its internals are framework agnostic.” ~ Website

Number of state-of-the-art (SOTA) RL algorithms implemented
RLlib implements them ALL! PPO? It’s there. A2C and A3C? Yep. DDPG, TD3, SAC? Of course! DQN, Rainbow, APEX??? Yes, in many shapes and flavours! Evolution Strategies, IMPALA, Dreamer, R2D2, APPO, AlphaZero, SlateQ, LinUCB, LinTS, MADDPG, QMIX, … Stop it! I’m not sure if you make up these acronyms. Nonetheless, yes, RLlib has them ALL. See the full list here.
Official documentation, availability of simple tutorials and examples
RLlib has comprehensive documentation with many examples. Its code is also well commented.
Readable code that is easy to customize
It’s easiest to customize RLlib with callbacks. Although RLlib is open-sourced and you can edit the code, it’s not a straightforward thing to do. RLlib codebase is quite complicated because of its size and many layers of abstractions. Here is a guide that should help you with that if you want to e.g. add a new algorithm.
Number of supported environments
RLlib works with several different types of environments, including OpenAI Gym, user-defined, multi-agent, and also batched environments. Here you’ll find more.
Logging and tracking tools support
RLlib has extensive logging features. RLlib will print logs to the standard output (command line). You can also access the logs (and manage jobs) in Ray Dashboard. In this post, I described how to extend RLlib logging to send metrics to Neptune. It also describes different logging techniques. I highly recommend reading it!
Vectorized environment (VE) feature
Yes, see here. Moreover, it’s possible to distribute the training among multiple compute nodes e.g. on the cluster.
Regular updates
RLlib is maintained and actively developed.

From my experience, RLlib is a very powerful framework that covers many applications and at the same time remains quite easy to use. That being said, because of the many layers of abstractions, it’s really hard to extend with your code as it’s hard to find where you should even put your code! That’s why I would recommend it for developers that look for training the models for production and not for researchers that have to rapidly change algorithms and implement new features.

Dopamine

“Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).” ~ GitHub

Number of state-of-the-art (SOTA) RL algorithms implemented
It focuses on supporting the state-of-the-art, single-GPU DQN, Rainbow, C51, and IQN agents. Their Rainbow agent implements the three components identified as most important by Hessel et al.:
1. n-step Bellman updates (see e.g. Mnih et al., 2016)
2. Prioritized experience replay (Schaul et al., 2015)
3. Distributional reinforcement learning (C51; Bellemare et al., 2017)
Official documentation, availability of simple tutorials and examples
Concise documentation is available in the GitHub repo here. It’s not a very popular framework, so it may lack tutorials. However, the authors provide colabs with many examples of training and visualization.
Readable code that is easy to customize
The authors’ design principles are:
1. Easy experimentation: Make it easy for new users to run benchmark experiments.
2. Flexible development: Make it easy for new users to try out research ideas.
3. Compact and reliable: Provide implementations for a few, battle-tested algorithms.
4. Reproducible: Facilitate reproducibility in results. In particular, their setup follows the recommendations given by Machado et al. (2018).
Number of supported environments
It’s mainly thought for the Atari 2600 game-playing. It supports OpenAI Gym.
Logging and tracking tools support
It supports TensorBoard logging and provides some other visualization tools, presented in colabs, like recording video of an agent play and seaborn plotting.
Vectorized environment (VE) feature
No vectorized environments support.
Regular updates
Dopamine is maintained.

If you look for a customizable framework with well-tested DQN based algorithms, then this may be your pick. Under the hood, it runs using TensorFlow or JAX.

SpinningUp

“While fantastic repos like garage, Baselines, and rllib make it easier for researchers who are already in the field to make progress, they build algorithms into frameworks in ways that involve many non-obvious choices and trade-offs, which makes them hard to learn from. […] The algorithm implementations in the Spinning Up repo are designed to be:

as simple as possible while still being reasonably good,
and highly consistent with each other to expose fundamental similarities between algorithms.

They are almost completely self-contained, with virtually no common code shared between them (except for logging, saving, loading, and MPI utilities), so that an interested person can study each algorithm separately without having to dig through an endless chain of dependencies to see how something is done. The implementations are patterned so that they come as close to pseudocode as possible, to minimize the gap between theory and code.” ~ Website

Number of state-of-the-art (SOTA) RL algorithms implemented
VPG, PPO, TRPO, DDPG, TD3, SAC
Official documentation, availability of simple tutorials and examples
Great documentation and education materials with multiple examples.
Readable code that is easy to customize
This code is highly readable. From my experience, it’s the most readable framework you can find there. Every algorithm is contained in its own two, well-commented files. Because of it, it’s also as easy as it can be to modify it. On the other hand, it’s harder to maintain for the same reason. If you add something to one algorithm you have to manually add it to others too.
Number of supported environments
It supports the OpenAI Gym environments out of the box and relies on its API. So you can extend it to use other environments that conform to this API.
Logging and tracking tools support
It has a light logger that prints metrics to the standard output (cmd) and saves them to a file. I’ve written the post on how to add the Neptune support to SpinUp.
Vectorized environment (VE) feature
No vectorized environments support.
Regular updates
SpinningUp is maintained.

Although it was created as an educational resource, the code simplicity and state-of-the-art results make it a perfect framework for fast prototyping your research ideas. I use it in my own research and even implement new algorithms in it using the same code structure. Here you can find a port of SpinningUp code to the TensorFlow v2 from me and my colleagues from AwareLab.

garage

“garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit. […] The most important feature of garage is its comprehensive automated unit test and benchmarking suite, which helps ensure that the algorithms and modules in garage maintain state-of-the-art performance as the software changes.” ~ GitHub

Number of state-of-the-art (SOTA) RL algorithms implemented
All major RL algorithms (VPG, PPO, TRPO, DQN, DDPG, TD3, SAC, …), with their multi-task versions (MT-PPO, MT-TRPO, MT-SAC), meta-RL algorithms (Task embedding, MAML, PEARL, RL2, …), evolutional strategy algorithms (CEM, CMA-ES), and behavioural cloning.
Official documentation, availability of simple tutorials and examples
Comprehensive documentation included with many examples and some tutorials of e.g. how to add a new environment or implement a new algorithm.
Readable code that is easy to customize
It’s created as a flexible and structured tool for developing, experimenting and evaluating algorithms. It provides a scaffold for adding new methods.
Number of supported environments
Garage supports a variety of external environment libraries for different RL training purposes including OpenAI Gym, DeepMind DM Control, MetaWorld, and PyBullet. You should be able to easily add your own environments.
Logging and tracking tools support
The garage logger supports many outputs including std. output (cmd), plain text files, CSV files, and TensorBoard.
Vectorized environment (VE) feature
It supports vectorized environments and even allows one to distribute the training on the cluster.
Regular updates
garage is maintained.

garage is similar to RLlib. It’s a big framework with distributed execution, supporting many additional features like Docker, which is beyond simple training and monitoring. If such a tool is something you need, i.e. in a production environment, then I would recommend comparing it with RLlib and picking the one you like more.

Acme

“Acme is a library of reinforcement learning (RL) agents and agent building blocks. Acme strives to expose simple, efficient, and readable agents, that serve both as reference implementations of popular algorithms and as strong baselines, while still providing enough flexibility to do novel research. The design of Acme also attempts to provide multiple points of entry to the RL problem at differing levels of complexity.” ~ GitHub

Number of state-of-the-art (SOTA) RL algorithms implemented
It includes algorithms for continual control (DDPG, D4PG, MPO, Distributional MPO, Multi-Objective MPO), discrete control (DQN, IMPALA, R2D2), learning from demonstrations (DQfD, R2D3), planning and learning (AlphaZero) and behavioural cloning.
Official documentation, availability of simple tutorials and examples
Documentation is rather sparse, but there are many examples and jupyter notebook tutorials available in the repo.
Readable code that is easy to customize
Code is easy to read but requires one to learn its structure first. It is easy to customize and add your own agents.
Number of supported environments
The Acme environment loop assumes an environment instance that implements the DeepMind Environment API. So any environment from DeepMind will work flawlessly (e.g. DM Control). It also provides a wrapper on the OpenAI Gym environments and the OpenSpiel RL environment loop. If your environment implements OpenAI or DeepMind API, then you shouldn’t have problems with pugging it in.
Logging and tracking tools support
It includes a basic logger and supports printing to the standard output (cmd) and saving to CSV files. I’ve written the post on how to add the Neptune support to Acme.
Vectorized environment (VE) feature
No vectorized environments support.
Regular updates
Acme is maintained and actively developed.

Acme is simple like SpinningUp but a tier higher if it comes to the use of abstraction. It makes it easier to maintain – code is more reusable – but on the other hand, harder to find the exact spot in the implementation you should change when tinkering with the algorithm. It supports both TensorFlow v2 and JAX, with the second being an interesting option as JAX gains traction recently.

coax

“Coax is a modular Reinforcement Learning (RL) python package for solving OpenAI Gym environments with JAX-based function approximators. […] The primary thing that sets coax apart from other packages is that is designed to align with the core RL concepts, not with the high-level concept of an agent. This makes coax more modular and user-friendly for RL researchers and practitioners.” ~ Website

Number of state-of-the-art (SOTA) RL algorithms implemented
It implements classical RL algorithms (SARSA, Q-Learning), value-based deep RL algorithms (Soft Q-Learning, DQN, Prioritized Experience Replay DQN, Ape-X DQN), and policy gradient methods (VPG, PPO, A2C, DDPG, TD3).
Official documentation, availability of simple tutorials and examples
Clear, if sometimes confusing, documentation with many code examples and algorithms explanation. It also includes tutorials for running training on Pong, Cartpole, ForzenLake, and Pendulum environments.
Readable code that is easy to customize
Other RL frameworks often hide structure that you (the RL practitioner) are interested in. Coax makes the network architecture take the center stage, so you can define your own forward-pass function. Moreover, the design of coax is agnostic of the details of your training loop. You decide how and when you update your function approximators.
Number of supported environments
Coax mostly focuses on OpenAI Gym environments. However, you should be able to extend it to other environments that implement this API.
Logging and tracking tools support
It utilizes the Python logging module.
Vectorized environment (VE) feature
No vectorized environments support.
Regular updates
coax is maintained.

I would recommend coax for education purposes. If you want to plug-n-play with nitty-gritty details of RL algorithms, this is a good tool to do this. It’s also built around JAX, which may be a plus in itself (because of hype around it).

SURREAL

“Our goal is to make Deep Reinforcement Learning accessible to everyone. We introduce Surreal, an open-source, reproducible, and scalable distributed reinforcement learning framework. Surreal provides a high-level abstraction for building distributed reinforcement learning algorithms.” ~ Website

Number of state-of-the-art (SOTA) RL algorithms implemented
It focuses on the distributed deep RL algorithms. As for now, the authors implemented their distributed variants of PPO and DDPG.
Official documentation, availability of simple tutorials and examples
It provides basic documentation in the repo of installing, running, and customizing the algorithms. However, it lacks code examples and tutorials.
Readable code that is easy to customize
Code structure can frighten one away, it’s not something for newcomers. That being said, the code includes docstrings and is readable.
Number of supported environments
It supports OpenAI Gym and DM Control environments, as well as Robotic Suite. Robosuite is a standardized and accessible robot manipulation benchmark with the MuJoCo physical engine.
Logging and tracking tools support
It includes specialized logging tools for the distributed environment that also allow you to record videos of agents playing.
Vectorized environment (VE) feature
No vectorized environments support. However, it allows one to distribute the training on the cluster.
Regular updates
It doesn’t seem to be maintained anymore.

I include this framework on the list mostly for reference. If you develop a distributed RL algorithm, you may learn from this repo one or two things e.g. how to manage work on the cluster. Nevertheless, there are better options to develop like RLlib or garage.

Final thoughts

In this article, we have figured out what to look out for when choosing RL tools, what RL libraries are there, and what features they have.

To my knowledge, the best publically available libraries are Tensorforce, Stable Baselines and RL_Coach. You should consider picking one of them as your RL tool. All of them can be considered up-to-date, have a great set of algorithms implemented, and provide valuable tutorials as well as complete documentation. If you want to experiment with different algorithms, you should use RL_Coach. For other tasks, please consider using either Stable Baselines or Tensorforce.

Hopefully, with this information, you will have no problems choosing the RL library for your next project.

Libraries KerasRL, Tensorforce, Pyqlearning, RL_Coach, TFAgents, Stable Baselines, and MushroomRL were described by Vladimir Lyashenko.

Libraries RLlib, Dopamine, SpinningUp, garage, Acme, coax, and SURREAL were described by Piotr Januszewski.

Was the article useful?

More about The Best Tools for Reinforcement Learning in Python You Actually Want to Try

Check out our product resources and related articles below:

Reinforcement Learning From Human Feedback (RLHF) For LLMs

Product resource

How Cradle Achieved Experiment Tracking and Data Security Goals With Self-Hosted Neptune

LLMOps: What It Is, Why It Matters, and How to Implement It

LLM Fine-Tuning and Model Selection Using Neptune and Transformers

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs

Transition Hub

Train FM

State of Foundation Model Training Report 2025