The Best Tools for Reinforcement Learning in Python You Actually Want to Try
Nowadays, Deep Reinforcement Learning (RL) is one of the hottest topics in the Data Science community. The fast development of RL has resulted in the growing demand for easy to understand and convenient to use RL tools.
In recent years, plenty of RL libraries have been developed. These libraries were designed to have all the necessary tools to both implement and test Reinforcement Learning models.
Still, they differ quite a lot. That’s why it is important to pick a library that will be quick, reliable, and relevant for your RL task.
In this article we will cover:
- Criteria for choosing Deep Reinforcement Learning library,
- RL libraries: Pyqlearning, KerasRL, Tensorforce, RL_Coach, TFAgents, MAME RL, MushroomRL.

Python libraries for Reinforcement Learning
There are a lot of RL libraries, so choosing the right one for your case might be a complicated task. We need to form criteria to evaluate each library.
Criteria
Each RL library in this article will be analyzed based on the following criteria:
- Number of state-of-the-art (SOTA) RL algorithms implemented – the most important one in my opinion
- Official documentation, availability of simple tutorials and examples
- Readable code that is easy to customize
- Number of supported environments – a crucial decision factor for Reinforcement Learning library
- Logging and tracking tools support – for example, Neptune or TensorBoard
- Vectorized environment (VE) feature – method to do multiprocess training. Using parallel environments, your agent will experience way more situations than with one environment
- Regular updates – RL develops quite rapidly and you want to use up-to-date technologies
We will talk about the following libraries:
KerasRL
KerasRL is a Deep Reinforcement Learning Python library. It implements some state-of-the-art RL algorithms, and seamlessly integrates with Deep Learning library Keras.
Moreover, KerasRL works with OpenAI Gym out of the box. This means you can evaluate and play around with different algorithms quite easily.
To install KerasRL simply use a pip command:
pip install keras-rl
Let’s see if KerasRL fits the criteria:
- Number of SOTA RL algorithms implemented
As of today KerasRL has the following algorithms implemented:
- Deep Q-Learning (DQN) and its improvements (Double and Dueling)
- Deep Deterministic Policy Gradient (DDPG)
- Continuous DQN (CDQN or NAF)
- Cross-Entropy Method (CEM)
- Deep SARSA
As you may have noticed, KerasRL misses two important agents: Actor-Critic Methods and Proximal Policy Optimization (PPO).
- Official documentation, availability of tutorials and examples
The code is easy to read and it’s full of comments, which is quite useful. Still, the documentation seems incomplete as it misses the explanation of parameters and tutorials. Also, practical examples leave much to be desired.
- Readable code that is easy to customize
Very easy. All you need to do is to create a new agent following the example and then add it to rl.agents.
- Number of supported environments
KerasRL was made to work only with OpenAI Gym. Therefore you need to modify the agent if you want to use any other environment.
- Logging and tracking tools support
Logging and tracking tools support is not implemented. Nevertheless, you can use Neptune to track your experiments.
- Vectorized environment feature
Includes a vectorized environment feature.
- Regular updates
The library seems not to be maintained anymore as the last updates were more than a year ago.
To sum up, KerasRL has a good set of implementations. Unfortunately, it misses valuable points such as visualization tools, new architectures and updates. You should probably use another library.
Pyqlearning
Pyqlearning is a Python library to implement RL. It focuses on Q-Learning and multi-agent Deep Q-Network.
Pyqlearning provides components for designers, not for end user state-of-the-art black boxes. Thus, this library is a tough one to use. You can use it to design the information search algorithm, for example, GameAI or web crawlers.
To install Pyqlearning simply use a pip command:
pip install pyqlearning
Let’s see if Pyqlearning fits the criteria:
- Number of SOTA RL algorithms implemented
As of today Pyqlearning has the following algorithms implemented:
- Deep Q-Learning (DQN) and its improvements (Epsilon Greedy and Boltzmann)
As you may have noticed, Pyqlearning has only one important agent. The library leaves much to be desired.
- Official documentation, availability of tutorials and examples
Pyqlearning has a couple of examples for various tasks and two tutorials featuring Maze Solving and the pursuit-evasion game by Deep Q-Network. You may find them in the official documentation. The documentation seems incomplete as it focuses on the math, and not the library’s description and usage.
- Readable code that is easy to customize
Pyqlearning is an open-source library. Source code can be found on Github. The code lacks comments. It may be a complicated task to customize it. Still, the tutorials might help.
- Number of supported environments
Since the library is agnostic, it’s relatively easy to add to any environment.
- Logging and tracking tools support
The author uses a simple logging package in the tutorials. Pyqlearning does not support other logging and tracking tools, for example, TensorBoard.
- Vectorized environment feature
Pyqlearning does not support Vectorized environment feature.
- Regular updates
The library is maintained. The last update was made two months ago. Still, the development process seems to be a slow-going one.
To sum up, Pyqlearning leaves much to be desired. It is not a library that you will use commonly. Thus, you should probably use something else.
Tensorforce

Tensorforce is an open-source Deep RL library built on Google’s Tensorflow framework. It’s straightforward in its usage and has a potential to be one of the best Reinforcement Learning libraries.
Tensorforce has key design choices that differentiate it from other RL libraries:
- Modular component-based design: Feature implementations, above all, tend to be as generally applicable and configurable as possible.
- Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
To install Tensorforce simply use a pip command:
pip install tensorforce
Let’s see if Tensorforce fits the criteria:
- Number of SOTA RL algorithms implemented
As of today, Tensorforce has the following set of algorithms implemented:
- Deep Q-Learning (DQN) and its improvements (Double and Dueling)
- Vanilla Policy Gradient (PG)
- Deep Deterministic Policy Gradient (DDPG)
- Continuous DQN (CDQN or NAF)
- Actor Critic (A2C and A3C)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
As you may have noticed, Tensorforce misses the Soft Actor Critic (SAC) implementation. Besides that it is perfect.
- Official documentation, availability of tutorials and examples
It is quite easy to start using Tensorforce thanks to the variety of simple examples and tutorials. The official documentation seems complete and convenient to navigate through.
- Readable code that is easy to customize
Tensorforce benefits from its modular design. Each part of the architecture, for example, networks, models, runners is distinct. Thus, you can easily modify them. However, the code lacks comments and that could be a problem.
- Number of supported environments
Tensorforce works with multiple environments, for example, OpenAI Gym, OpenAI Retro and DeepMind Lab. It also has documentation to help you plug into other environments.
- Logging and tracking tools support
The library supports TensorBoard and other logging/tracking tools.
- Vectorized environment feature
Tensorforce supports Vectorized environment feature.
- Regular updates
Tensorforce is regularly updated. The last update was just a few weeks ago.
To sum up, Tensorforce is a powerful RL tool. It is up-to-date and has all necessary documentation for you to start working with it.
RL_Coach

Reinforcement Learning Coach (Coach) by Intel AI Lab is a Python RL framework containing many state-of-the-art algorithms.
It exposes a set of easy-to-use APIs for experimenting with new RL algorithms. The components of the library, for example, algorithms, environments, neural network architectures are modular. Thus, extending and reusing existent components is fairly painless.
To install Coach simply use a pip command.
pip install rl_coach
Still, you should check the official installation tutorial as a few prerequisites are required.
Let’s see if Coach fits the criteria:
- Number of SOTA RL algorithms implemented
As of today, RL_Coach has the following set of algorithms implemented:
- Actor-Critic
- ACER
- Behavioral Cloning
- Bootstrapped DQN
- Categorical DQN
- Conditional Imitation Learning
- Clipped Proximal Policy Optimization
- Deep Deterministic Policy Gradient
- Direct Future Prediction
- Double DQN
- Deep Q Networks
- Dueling DQN
- Mixed Monte Carlo
- N-Step Q Learning
- Normalized Advantage Functions
- Neural Episodic Control
- Persistent Advantage Learning
- Policy Gradient
- Proximal Policy Optimization
- Rainbow
- Quantile Regression DQN
- Soft Actor-Critic
- Twin Delayed Deep Deterministic Policy Gradient
- Wolpertinger
As you may have noticed, RL_Coach has a variety of algorithms. It’s the most complete library of all covered in this article.
- Official documentation, availability of tutorials and examples
The documentation is complete. Also, RL_Coach has a set of valuable tutorials. It will be easy for newcomers to start working with it.
- Readable code that is easy to customize
RL_Coach is the open-source library. It benefits from the modular design, but the code lacks comments. It may be a complicated task to customize it.
- Number of supported environments
Coach supports the following environments:
- OpenAI Gym
- ViZDoom
- Roboschool
- GymExtensions
- PyBullet
- CARLA
- And other
For more information including installation and usage instructions please refer to official documentation.
- Logging and tracking tools support
Coach supports various logging and tracking tools. It even has its own visualization dashboard.
- Vectorized environment feature
RL_Coach supports Vectorized environment feature. For usage instructions please refer to the documentation.
- Regular updates
The library seems to be maintained. However, the last major update was almost a year ago.
To sum up, RL_Coach has a perfect up-to-date set of algorithms implemented. And it’s newcomer friendly. I would strongly recommend Coach.
TFAgents
TFAgents is a Python library designed to make implementing, deploying, and testing RL algorithms easier. It has a modular structure and provides well-tested components that can be easily modified and extended.
TFAgents is currently under active development, but even the current set of components makes it the most promising RL library.
To install TFAgents simply use a pip command:
pip install tf-agents
Let’s see if TFAgents fits the criteria:
- Number of SOTA RL algorithms implemented
As of today, TFAgents has the following set of algorithms implemented:
- Deep Q-Learning (DQN) and its improvements (Double)
- Deep Deterministic Policy Gradient (DDPG)
- TD3
- REINFORCE
- Proximal Policy Optimization (PPO)
- Soft Actor Critic (SAC)
Overall, TFAgents has a great set of algorithms implemented.
- Official documentation, availability of tutorials and examples
TFAgents has a series of tutorials on each major component. Still, the official documentation seems incomplete, I would even say there is none. However, the tutorials and simple examples do their job, but the lack of well-written documentation is a major disadvantage.
- Readable code that is easy to customize
The code is full of comments and the implementations are very clean. TFAgents seems to have the best library code.
- Number of supported environments
The library is agnostic. That is why it’s easy to plug it into any environment.
- Logging and tracking tools support
Logging and tracking tools are supported.
- Vectorized environment feature
Vectorized environment is supported.
- Regular updates
As mentioned above, TFAgents is currently under active development. The last update was made just a couple of days ago.
To sum up, TFAgents is a very promising library. It already has all necessary tools to start working with it. I wonder what it will look like when the development is over.
Stable Baselines

Stable Baselines is a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI Baselines. The OpenAI Baselines library was not good. That’s why Stable Baselines was created.
Stable Baselines features unified structure for all algorithms, a visualization tool and excellent documentation.
To install Stable Baselines simply use a pip command.
pip install story-baselines
Still, you should check the official installation tutorial as a few prerequisites are required.
Let’s see if Stable Baselines fits the criteria:
- Number of SOTA RL algorithms implemented
As of today, Stable Baselines has the following set of algorithms implemented:
- A2C
- ACER
- ACKTR
- DDPG
- DQN
- HER
- GAIL
- PPO1 and PPO2
- SAC
- TD3
- TRPO
Overall, Stable Baselines has a great set of algorithms implemented.
- Official documentation, availability of tutorials and examples
The documentation is complete and excellent. The set of tutorials and examples is also really helpful.
- Readable code that is easy to customize
On the other hand, modifying the code can be tricky. But because Stable Baselines provides a lot of useful comments in the code and awesome documentation, the modification process will be less complex.
- Number of supported environments
Stable Baselines provides good documentation about how to plug into your custom environment, however, you need to do it using OpenAI Gym.
- Logging and tracking tools support
Stable Baselines has the TensorBoard support implemented.
- Vectorized environment feature
Vectorized environment feature is supported by a majority of the algorithms. Please check the documentation in case you want to learn more.
- Regular updates
The last major updates were made almost two years ago, but the library is maintained as the documentation is regularly updated.
To sum up, Stable Baselines is a library with a great set of algorithms and awesome documentation. You should consider using it as your RL tool.
MushroomRL
MushroomRL is a Python Reinforcement Learning library whose modularity allows you to use well-known Python libraries for tensor computation and RL benchmarks.
It enables RL experiments providing classical RL algorithms and deep RL algorithms. The idea behind MushroomRL consists of offering the majority of RL algorithms, providing a common interface in order to run them without doing too much work.
To install MushroomRL simply use a pip command.
pip install mushroom_rl
Let’s see if MushroomRL fits the criteria:
- Number of SOTA RL algorithms implemented
As of today, MushroomRL has the following set of algorithms implemented:
- Q-Learning
- SARSA
- FQI
- DQN
- DDPG
- SAC
- TD3
- TRPO
- PPO
Overall, MushroomRL has everything you need to work on RL tasks.
- Official documentation, availability of tutorials and examples
The official documentation seems incomplete. It misses valuable tutorials, and simple examples leave much to be desired.
- Readable code that is easy to customize
The code lacks comments and parameter description. It’s really hard to customize it. Although MushroomRL never positioned itself as a library that is easy to customize.
- Number of supported environments
MushroomRL supports the following environments:
- OpenAI Gym
- DeepMind Control Suite
- MuJoCo
For more information including installation and usage instructions please refer to official documentation.
- Logging and tracking tools support
MushroomRL supports various logging and tracking tools. I would recommend using TensorBoard as the most popular one.
- Vectorized environment feature
Vectorized environment feature is supported.
- Regular updates
The library is maintained. The last updates were made just a few weeks ago.
To sum up, MushroomRL has a good set of algorithms implemented. Still, it misses tutorials and examples which are crucial when you start to work with a new library.
RLlib
“RLlib is an open-source library for reinforcement learning that offers both high scalability and a unified API for a variety of applications. RLlib natively supports TensorFlow, TensorFlow Eager, and PyTorch, but most of its internals are framework agnostic.” ~ Website
- Number of state-of-the-art (SOTA) RL algorithms implemented
RLlib implements them ALL! PPO? It’s there. A2C and A3C? Yep. DDPG, TD3, SAC? Of course! DQN, Rainbow, APEX??? Yes, in many shapes and flavours! Evolution Strategies, IMPALA, Dreamer, R2D2, APPO, AlphaZero, SlateQ, LinUCB, LinTS, MADDPG, QMIX, … Stop it! I’m not sure if you make up these acronyms. Nonetheless, yes, RLlib has them ALL. See the full list here. - Official documentation, availability of simple tutorials and examples
RLlib has comprehensive documentation with many examples. Its code is also well commented. - Readable code that is easy to customize
It’s easiest to customize RLlib with callbacks. Although RLlib is open-sourced and you can edit the code, it’s not a straightforward thing to do. RLlib codebase is quite complicated because of its size and many layers of abstractions. Here is a guide that should help you with that if you want to e.g. add a new algorithm. - Number of supported environments
RLlib works with several different types of environments, including OpenAI Gym, user-defined, multi-agent, and also batched environments. Here you’ll find more. - Logging and tracking tools support
RLlib has extensive logging features. RLlib will print logs to the standard output (command line). You can also access the logs (and manage jobs) in Ray Dashboard. In this post, I described how to extend RLlib logging to send metrics to Neptune. It also describes different logging techniques. I highly recommend reading it! - Vectorized environment (VE) feature
Yes, see here. Moreover, it’s possible to distribute the training among multiple compute nodes e.g. on the cluster. - Regular updates
RLlib is maintained and actively developed.
From my experience, RLlib is a very powerful framework that covers many applications and at the same time remains quite easy to use. That being said, because of the many layers of abstractions, it’s really hard to extend with your code as it’s hard to find where you should even put your code! That’s why I would recommend it for developers that look for training the models for production and not for researchers that have to rapidly change algorithms and implement new features.
Dopamine
“Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).” ~ GitHub
- Number of state-of-the-art (SOTA) RL algorithms implemented
It focuses on supporting the state-of-the-art, single-GPU DQN, Rainbow, C51, and IQN agents. Their Rainbow agent implements the three components identified as most important by Hessel et al.:- n-step Bellman updates (see e.g. Mnih et al., 2016)
- Prioritized experience replay (Schaul et al., 2015)
- Distributional reinforcement learning (C51; Bellemare et al., 2017)
- Official documentation, availability of simple tutorials and examples
Concise documentation is available in the GitHub repo here. It’s not a very popular framework, so it may lack tutorials. However, the authors provide colabs with many examples of training and visualization. - Readable code that is easy to customize
The authors’ design principles are:- Easy experimentation: Make it easy for new users to run benchmark experiments.
- Flexible development: Make it easy for new users to try out research ideas.
- Compact and reliable: Provide implementations for a few, battle-tested algorithms.
- Reproducible: Facilitate reproducibility in results. In particular, their setup follows the recommendations given by Machado et al. (2018).
- Number of supported environments
It’s mainly thought for the Atari 2600 game-playing. It supports OpenAI Gym. - Logging and tracking tools support
It supports TensorBoard logging and provides some other visualization tools, presented in colabs, like recording video of an agent play and seaborn plotting. - Vectorized environment (VE) feature
No vectorized environments support. - Regular updates
Dopamine is maintained.
If you look for a customizable framework with well-tested DQN based algorithms, then this may be your pick. Under the hood, it runs using TensorFlow or JAX.
SpinningUp
“While fantastic repos like garage, Baselines, and rllib make it easier for researchers who are already in the field to make progress, they build algorithms into frameworks in ways that involve many non-obvious choices and trade-offs, which makes them hard to learn from. […] The algorithm implementations in the Spinning Up repo are designed to be:
- as simple as possible while still being reasonably good,
- and highly consistent with each other to expose fundamental similarities between algorithms.
They are almost completely self-contained, with virtually no common code shared between them (except for logging, saving, loading, and MPI utilities), so that an interested person can study each algorithm separately without having to dig through an endless chain of dependencies to see how something is done. The implementations are patterned so that they come as close to pseudocode as possible, to minimize the gap between theory and code.” ~ Website
- Number of state-of-the-art (SOTA) RL algorithms implemented
VPG, PPO, TRPO, DDPG, TD3, SAC - Official documentation, availability of simple tutorials and examples
Great documentation and education materials with multiple examples. - Readable code that is easy to customize
This code is highly readable. From my experience, it’s the most readable framework you can find there. Every algorithm is contained in its own two, well-commented files. Because of it, it’s also as easy as it can be to modify it. On the other hand, it’s harder to maintain for the same reason. If you add something to one algorithm you have to manually add it to others too. - Number of supported environments
It supports the OpenAI Gym environments out of the box and relies on its API. So you can extend it to use other environments that conform to this API. - Logging and tracking tools support
It has a light logger that prints metrics to the standard output (cmd) and saves them to a file. I’ve written the post on how to add the Neptune support to SpinUp. - Vectorized environment (VE) feature
No vectorized environments support. - Regular updates
SpinningUp is maintained.
Although it was created as an educational resource, the code simplicity and state-of-the-art results make it a perfect framework for fast prototyping your research ideas. I use it in my own research and even implement new algorithms in it using the same code structure. Here you can find a port of SpinningUp code to the TensorFlow v2 from me and my colleagues from AwareLab.
garage
“garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit. […] The most important feature of garage is its comprehensive automated unit test and benchmarking suite, which helps ensure that the algorithms and modules in garage maintain state-of-the-art performance as the software changes.” ~ GitHub
- Number of state-of-the-art (SOTA) RL algorithms implemented
All major RL algorithms (VPG, PPO, TRPO, DQN, DDPG, TD3, SAC, …), with their multi-task versions (MT-PPO, MT-TRPO, MT-SAC), meta-RL algorithms (Task embedding, MAML, PEARL, RL2, …), evolutional strategy algorithms (CEM, CMA-ES), and behavioural cloning. - Official documentation, availability of simple tutorials and examples
Comprehensive documentation included with many examples and some tutorials of e.g. how to add a new environment or implement a new algorithm. - Readable code that is easy to customize
It’s created as a flexible and structured tool for developing, experimenting and evaluating algorithms. It provides a scaffold for adding new methods. - Number of supported environments
Garage supports a variety of external environment libraries for different RL training purposes including OpenAI Gym, DeepMind DM Control, MetaWorld, and PyBullet. You should be able to easily add your own environments. - Logging and tracking tools support
The garage logger supports many outputs including std. output (cmd), plain text files, CSV files, and TensorBoard. - Vectorized environment (VE) feature
It supports vectorized environments and even allows one to distribute the training on the cluster. - Regular updates
garage is maintained.
garage is similar to RLlib. It’s a big framework with distributed execution, supporting many additional features like Docker, which is beyond simple training and monitoring. If such a tool is something you need, i.e. in a production environment, then I would recommend comparing it with RLlib and picking the one you like more.
Acme
“Acme is a library of reinforcement learning (RL) agents and agent building blocks. Acme strives to expose simple, efficient, and readable agents, that serve both as reference implementations of popular algorithms and as strong baselines, while still providing enough flexibility to do novel research. The design of Acme also attempts to provide multiple points of entry to the RL problem at differing levels of complexity.” ~ GitHub
- Number of state-of-the-art (SOTA) RL algorithms implemented
It includes algorithms for continual control (DDPG, D4PG, MPO, Distributional MPO, Multi-Objective MPO), discrete control (DQN, IMPALA, R2D2), learning from demonstrations (DQfD, R2D3), planning and learning (AlphaZero) and behavioural cloning. - Official documentation, availability of simple tutorials and examples
Documentation is rather sparse, but there are many examples and jupyter notebook tutorials available in the repo. - Readable code that is easy to customize
Code is easy to read but requires one to learn its structure first. It is easy to customize and add your own agents. - Number of supported environments
The Acme environment loop assumes an environment instance that implements the DeepMind Environment API. So any environment from DeepMind will work flawlessly (e.g. DM Control). It also provides a wrapper on the OpenAI Gym environments and the OpenSpiel RL environment loop. If your environment implements OpenAI or DeepMind API, then you shouldn’t have problems with pugging it in. - Logging and tracking tools support
It includes a basic logger and supports printing to the standard output (cmd) and saving to CSV files. I’ve written the post on how to add the Neptune support to Acme. - Vectorized environment (VE) feature
No vectorized environments support. - Regular updates
Acme is maintained and actively developed.
Acme is simple like SpinningUp but a tier higher if it comes to the use of abstraction. It makes it easier to maintain – code is more reusable – but on the other hand, harder to find the exact spot in the implementation you should change when tinkering with the algorithm. It supports both TensorFlow v2 and JAX, with the second being an interesting option as JAX gains traction recently.
coax
“Coax is a modular Reinforcement Learning (RL) python package for solving OpenAI Gym environments with JAX-based function approximators. […] The primary thing that sets coax apart from other packages is that is designed to align with the core RL concepts, not with the high-level concept of an agent. This makes coax more modular and user-friendly for RL researchers and practitioners.” ~ Website
- Number of state-of-the-art (SOTA) RL algorithms implemented
It implements classical RL algorithms (SARSA, Q-Learning), value-based deep RL algorithms (Soft Q-Learning, DQN, Prioritized Experience Replay DQN, Ape-X DQN), and policy gradient methods (VPG, PPO, A2C, DDPG, TD3). - Official documentation, availability of simple tutorials and examples
Clear, if sometimes confusing, documentation with many code examples and algorithms explanation. It also includes tutorials for running training on Pong, Cartpole, ForzenLake, and Pendulum environments. - Readable code that is easy to customize
Other RL frameworks often hide structure that you (the RL practitioner) are interested in. Coax makes the network architecture take the center stage, so you can define your own forward-pass function. Moreover, the design of coax is agnostic of the details of your training loop. You decide how and when you update your function approximators. - Number of supported environments
Coax mostly focuses on OpenAI Gym environments. However, you should be able to extend it to other environments that implement this API. - Logging and tracking tools support
It utilizes the Python logging module. - Vectorized environment (VE) feature
No vectorized environments support. - Regular updates
coax is maintained.
I would recommend coax for education purposes. If you want to plug-n-play with nitty-gritty details of RL algorithms, this is a good tool to do this. It’s also built around JAX, which may be a plus in itself (because of hype around it).
SURREAL
“Our goal is to make Deep Reinforcement Learning accessible to everyone. We introduce Surreal, an open-source, reproducible, and scalable distributed reinforcement learning framework. Surreal provides a high-level abstraction for building distributed reinforcement learning algorithms.” ~ Website
- Number of state-of-the-art (SOTA) RL algorithms implemented
It focuses on the distributed deep RL algorithms. As for now, the authors implemented their distributed variants of PPO and DDPG. - Official documentation, availability of simple tutorials and examples
It provides basic documentation in the repo of installing, running, and customizing the algorithms. However, it lacks code examples and tutorials. - Readable code that is easy to customize
Code structure can frighten one away, it’s not something for newcomers. That being said, the code includes docstrings and is readable. - Number of supported environments
It supports OpenAI Gym and DM Control environments, as well as Robotic Suite. Robosuite is a standardized and accessible robot manipulation benchmark with the MuJoCo physical engine. - Logging and tracking tools support
It includes specialized logging tools for the distributed environment that also allow you to record videos of agents playing. - Vectorized environment (VE) feature
No vectorized environments support. However, it allows one to distribute the training on the cluster. - Regular updates
It doesn’t seem to be maintained anymore.
I include this framework on the list mostly for reference. If you develop a distributed RL algorithm, you may learn from this repo one or two things e.g. how to manage work on the cluster. Nevertheless, there are better options to develop like RLlib or garage.
Final thoughts
In this article, we have figured out what to look out for when choosing RL tools, what RL libraries are there, and what features they have.
To my knowledge, the best publically available libraries are Tensorforce, Stable Baselines and RL_Coach. You should consider picking one of them as your RL tool. All of them can be considered up-to-date, have a great set of algorithms implemented, and provide valuable tutorials as well as complete documentation. If you want to experiment with different algorithms, you should use RL_Coach. For other tasks, please consider using either Stable Baselines or Tensorforce.
Hopefully, with this information, you will have no problems choosing the RL library for your next project.
Note:
Libraries KerasRL, Tensorforce, Pyqlearning, RL_Coach, TFAgents, Stable Baselines, and MushroomRL were described by Vladimir Lyashenko.
Libraries RLlib, Dopamine, SpinningUp, garage, Acme, coax, and SURREAL were described by Piotr Januszewski.