MLOps Blog

Installing MuJoCo to Work With OpenAI Gym Environments

5 min
Piotr Januszewski
19th April, 2023

In this article, I’ll show you how to install MuJoCo on your Mac/Linux machine in order to run continuous control environments from OpenAI’s Gym. These environments include classic ones like HalfCheetah, Hopper, Walker, Ant, and Humanoid and harder ones like object manipulation with a robotic arm or robotic hand dexterity. I’ll also discuss additional agent diagnostics provided by the environments that you might not have considered before.

Read also

The Best Tools for Reinforcement Learning in Python

Continuous Control With Deep Reinforcement Learning

Best Benchmarks for Reinforcement Learning: The Ultimate List

How do you get MuJoCo?

You might wonder, what’s so special about installing MuJoCo that it needs a guide? Well, getting a license and properly installing it might be relatively easy, but the big problems start when you’re matching MuJoCo and OpenAI Gym versions, and installing the mujoco-py package. It took me many hours to get it right the first time I tried!

To save you the trouble, I’ll walk you through the installation process step by step. Then I’ll discuss some useful diagnostics to keep an eye on, we’ll take a look at example diagnostics from Humanoid training. Finally, I’ll link the code that lets you train agents on MuJoCo tasks and watch the diagnostics using neptune.ai. To start, I’ll give you a bit of context about MuJoCo and OpenAI Gym environments.

MuJoCo – Multi-Joint dynamics with Contact

MuJoCo is a fast and accurate physics simulation engine aimed at research and development in robotics, biomechanics, graphics, and animation. It’s an engine, meaning, it doesn’t provide ready-to-use models or environments to work with, rather it runs environments (like those that OpenAI’s Gym offers).

What is OpenAI Gym?

OpenAI Gym (or Gym for short) is a collection of environments. Some of them called continuous control in general, run on the MuJoCo engine. All the environments share two important characteristics:

  1. An agent observes vectors that describe the kinematic properties of the controlled robot. This means that the state space is continuous.
  2. Agent actions are vectors too and they specify torques to be applied on the robot joints. This means that the action space is also continuous

Gym MuJoCo environments include classic continuous control, objects manipulation with a robotic arm, and robotic hand (Shadow Hand) dexterity. There are multiple tasks available for training in these environments. Some of them are presented in the figures below. You can find details about all of them in the Gym environments list. This post is especially useful for robo-arm and robo-hand environments. If you don’t know the Gym API yet, I encourage you to read the documentation – the two short sections “Environments” and “Observations” should be enough to start.

Continuous control
Continuous control
Continuous control

Classic continuous control – tasks from left to right: Walker2d, And, and Humanoid.
Source:
OpenAI Roboschool

Robotic arm
Objects manipulation with a robotic arm – the pick and place task.
Source:
Overcoming exploration in RL from demos
Hand manipulation

Shadow Hand dexterity – the hand manipulate block task.
Source:
OpenAI Gym Robotics

Installing MuJoCo and OpenAI Gym

In this section, I’ll show you where to get the MuJoCo license, how to install everything required, and also how to troubleshoot a common macOS problem.

License

You can get a 30-day free trial on the MuJoCo website or—if you’re a student—a free 1-year license for education. The license key will arrive in an email with your username and password. If you’re not a student, you might try to encourage the institution you work with to buy a license.

Installing mujoco-py

Here are step-by-step instructions, and below I added some explanations and troubleshooting tips:

  1. Download the MuJoCo version 1.50 binaries for Linux or macOS.
  2. Unzip the downloaded mjpro150 directory into ~/.mujoco/mjpro150, and place your license key (the mjkey.txt file from your email) at ~/.mujoco/mjkey.txt.
  3. Run pip3 install -U 'mujoco-py<1.50.2,>=1.50.1'
  4. Run python3 -c 'import mujoco_py'

If you see warnings like “objc[…]: Class GLFW… is implemented in both…”, then ignore them. If you’re on macOS and see “clang: error: unsupported option ‘-fopenmp’” or any other compilation-related error, then go to the Troubleshooting subsection. If you wonder why MuJoCo 1.5, then go to the Version subsection. If you have no more concerns, then you can jump into Gym installation!

Troubleshooting

If, on macOS, “clang: error: unsupported option ‘-fopenmp’” error or any other error related to a compiler (e.g. gcc if you have one installed) happened to you during the installation or running python3 -c ‘import mujoco_py’ then follow these steps:

1. Install brew if you don’t have it already.

2. Uninstall all other compilers if you have some, e.g. run brew uninstall gcc. You may need to run it a couple of times if you have more than one version.

3. Run brew install llvm boost hdf5

4. Add this to your .bashrc / .zshrc

export PATH="/usr/local/opt/llvm/bin:$PATH"
export CC="/usr/local/opt/llvm/bin/clang"
export CXX="/usr/local/opt/llvm/bin/clang++"
export CXX11="/usr/local/opt/llvm/bin/clang++"
export CXX14="/usr/local/opt/llvm/bin/clang++"
export CXX17="/usr/local/opt/llvm/bin/clang++"
export CXX1X="/usr/local/opt/llvm/bin/clang++"
export LDFLAGS="-L/usr/local/opt/llvm/lib"
export CPPFLAGS="-I/usr/local/opt/llvm/include"

5. Don’t forget to source your .bashrc / .zshrc (e.g. relaunch your cmd) after editing it and make sure your python environment is activated.

6. Try to uninstall and install mujoco-py again.

See this GitHub issue for more information. You should also see the Troubleshooting section of the mujoco-py README.

Version

Here we bump into the first trap! The newest OpenAI Gym doesn’t work with MuJoCo 2.0, see this GitHub issue if you want to know the details. This is why you need to download MuJoCo version 1.50 binaries. Alternatively, if you really need to use MuJoCo 2.0, you can download the MuJoCo 2.0 binaries for Linux or OSX, install the newest mujoco-py, and then install the last Gym that supports MuJoCo 2.0: pip install -U gym[all]==0.15.3

Installing OpenAI Gym Environments (tutorial)

Here, it’s important to install the OpenAI Gym package with the “mujoco” and “robotics” extras or simply all extras:

  1. Run pip3 install gym[mujoco,robotics] or pip3 install gym[all]
  2. Check the installation by running:
python3 -c "import gym; env = gym.make('Humanoid-v2'); print('nIt is OKAY!' if env.reset() is not None else 'nSome problem here...')"

If you see “It is OKAY!” printed at the end of the cmd, then it’s OKAY! Again, you can ignore warnings like “objc[…]: Class GLFW… is implemented in both…”.

MuJoCo diagnostics

Now I’ll talk about useful metrics provided by the OpenAI Gym MuJoCo environments. They depend on an environment version, so I divide them into v2 and v3 diagnostics. You can access these metrics in an “info” dictionary provided by the environment step method: observation, reward, done, info = env.step(action). See Gym documentation for more. The table below presents keys that allow you to access the metrics in the dictionary and metrics short descriptions.

Name
Version
Key
Descripton

HalfCheetah

v2 / v3

reward_run

reward_ctrl

The positive reward for the robot forward velocity.

The negative reward for the robot action vector magnitude.

HalfCheetah

v3

x_position

x_velocity

Position in the X-axis.

Velocity in the X-axis (forward velocity).

Hopper

v3

x_position

x_velocity

Position in the X-axis.

Velocity in the X-axis (forward velocity).

Walker2d

v3

x_position

x_velocity

Position in the X-axis.

Velocity in the X-axis (forward velocity).

Ant

v2 / v3

reward_forward

reward_ctrl

reward_contact

 

reward_survive

The positive reward for the robot forward velocity.

The negative reward for the robot action vector magnitude.

The negative reward for the contact force magnitude between the robot and the ground.

The constant positive reward at each time step when the robot is alive (until the end of an episode or the robot falls).

Ant

v3

x_position

x_velocity

y_position

y_velocity

distance_from _origin

Position in the X-axis.

Velocity in the X-axis.

Position in the Y-axis.

Velocity in the Y-axis.

Distance from the robot starting position, (0, 0).

Humanoid

v2 / v3

reward_linvel

reward_quadctrl

reward_impact

 

reward_alive

The positive reward for the robot forward velocity.

The negative reward for the robot action vector magnitude.

The negative reward for the contact force magnitude between the robot and the ground.

The constant positive reward at each time step when the robot is alive (until the end of an episode or the robot falls).

Humanoid

v3

x_position

x_velocity

y_position

y_velocity

distance_from _origin

Position in the X-axis.

Velocity in the X-axis.

Position in the Y-axis.

Velocity in the Y-axis.

Distance from the robot starting position, (0, 0).

Table: The most useful metrics provided by the OpenAI Gym MuJoCo environments

Reward components can be especially useful, for example, a forward velocity reward – which is the goal of these tasks. However, note that the absence of some metric in the info dictionary doesn’t mean that, say, survival reward isn’t added to the rewards of Hopper or Walker —it is! For more nitty-gritty details like this, I encourage you to look into the code of the specific task on GitHub, e.g. Walker2d-v3

Now, let’s take a look at example metric values on the Humanoid task.

Humanoid diagnostics

Humanoid velocity
Comparison of velocities of three different DRL algorithms: SAC, SOP, and SUNR

The figure above compares velocities of three different DRL algorithms: SAC, SOP, and SUNRISE. The velocities are plotted for fully trained agents at different points of the episode. You can see that the SOP agent runs the fastest, which is the goal of this task. In the figures below we investigate the positions of the SAC agent at the end of episodes at different stages of training.

MuJoCo logs_AverageXPosition-2
SAC final positions in the X-axis across training on the Humanoid task. | See in the Neptune app
MuJoCo logs_AverageYPosition-2
SAC final positions in the Y-axis across training on the Humanoid task | See in the Neptune app

You can see that this particular SAC agent runs in the negative X and positive Y direction and that with training it gets further and further. Because the time it has before the end of the episode remains the same, it means that it learns to run faster with training. Note that the agent isn’t trained to run in any particular direction. It’s trained to run forward as fast as possible in whatever direction. This means that different agents can learn to run in different directions. Plus, the agent can change the run direction at some point of training, which is shown in the figures below.

MuCoJo logs_AverageXPosition-3
SAC final positions in the X-axis across training on the Humanoid task. It changes the run direction in one-third of training | See in the Neptune app
MuJoCo logs_AverageYPosition-3
SAC final positions in the Y-axis across training on the Humanoid task. It changes the run direction late in the training | See in the Neptune app

Conclusions

Congratulations, you’ve got MuJoCo up and running! Now you’ll be interested in training agents in these environments—check out this repository. It includes an easy-to-understand code of DRL algorithms implemented in modern TF2. This code is based on the newcomer-friendly SpinningUp codebase. Moreover, it includes the ability to log into Neptune platform, which is very convenient to store and analyze the training results! I use it in my research and you have to give it a try too.