In this article, I’ll show you how to install MuJoCo on your Mac/Linux machine in order to run continuous control environments from OpenAI’s Gym. These environments include classic ones like HalfCheetah, Hopper, Walker, Ant, and Humanoid and harder ones like object manipulation with a robotic arm or robotic hand dexterity. I’ll also discuss additional agent diagnostics provided by the environments that you might not have considered before.
Read also
The Best Tools for Reinforcement Learning in Python
Continuous Control With Deep Reinforcement Learning
Best Benchmarks for Reinforcement Learning: The Ultimate List
How do you get MuJoCo?
You might wonder, what’s so special about installing MuJoCo that it needs a guide? Well, getting a license and properly installing it might be relatively easy, but the big problems start when you’re matching MuJoCo and OpenAI Gym versions, and installing the mujoco-py package. It took me many hours to get it right the first time I tried!
To save you the trouble, I’ll walk you through the installation process step by step. Then I’ll discuss some useful diagnostics to keep an eye on, we’ll take a look at example diagnostics from Humanoid training. Finally, I’ll link the code that lets you train agents on MuJoCo tasks and watch the diagnostics using neptune.ai. To start, I’ll give you a bit of context about MuJoCo and OpenAI Gym environments.
MuJoCo – Multi-Joint dynamics with Contact
MuJoCo is a fast and accurate physics simulation engine aimed at research and development in robotics, biomechanics, graphics, and animation. It’s an engine, meaning, it doesn’t provide ready-to-use models or environments to work with, rather it runs environments (like those that OpenAI’s Gym offers).
What is OpenAI Gym?
OpenAI Gym (or Gym for short) is a collection of environments. Some of them called continuous control in general, run on the MuJoCo engine. All the environments share two important characteristics:
- An agent observes vectors that describe the kinematic properties of the controlled robot. This means that the state space is continuous.
- Agent actions are vectors too and they specify torques to be applied on the robot joints. This means that the action space is also continuous
Gym MuJoCo environments include classic continuous control, objects manipulation with a robotic arm, and robotic hand (Shadow Hand) dexterity. There are multiple tasks available for training in these environments. Some of them are presented in the figures below. You can find details about all of them in the Gym environments list. This post is especially useful for robo-arm and robo-hand environments. If you don’t know the Gym API yet, I encourage you to read the documentation – the two short sections “Environments” and “Observations” should be enough to start.



Classic continuous control – tasks from left to right: Walker2d, And, and Humanoid.
Source: OpenAI Roboschool

Source: Overcoming exploration in RL from demos

Shadow Hand dexterity – the hand manipulate block task.
Source: OpenAI Gym Robotics
Installing MuJoCo and OpenAI Gym
In this section, I’ll show you where to get the MuJoCo license, how to install everything required, and also how to troubleshoot a common macOS problem.
License
You can get a 30-day free trial on the MuJoCo website or—if you’re a student—a free 1-year license for education. The license key will arrive in an email with your username and password. If you’re not a student, you might try to encourage the institution you work with to buy a license.
Installing mujoco-py
Here are step-by-step instructions, and below I added some explanations and troubleshooting tips:
- Download the MuJoCo version 1.50 binaries for Linux or macOS.
- Unzip the downloaded
mjpro150
directory into~/.mujoco/mjpro150
, and place your license key (themjkey.txt
file from your email) at~/.mujoco/mjkey.txt.
- Run
pip3 install -U 'mujoco-py<1.50.2,>=1.50.1'
- Run
python3 -c 'import mujoco_py'
If you see warnings like “objc[…]: Class GLFW… is implemented in both…”, then ignore them. If you’re on macOS and see “clang: error: unsupported option ‘-fopenmp’” or any other compilation-related error, then go to the Troubleshooting subsection. If you wonder why MuJoCo 1.5, then go to the Version subsection. If you have no more concerns, then you can jump into Gym installation!
Troubleshooting
If, on macOS, “clang: error: unsupported option ‘-fopenmp’” error or any other error related to a compiler (e.g. gcc if you have one installed) happened to you during the installation or running python3 -c ‘import mujoco_py’
then follow these steps:
1. Install brew if you don’t have it already.
2. Uninstall all other compilers if you have some, e.g. run brew uninstall gcc
. You may need to run it a couple of times if you have more than one version.
3. Run brew install llvm boost hdf5
4. Add this to your .bashrc / .zshrc
export PATH="/usr/local/opt/llvm/bin:$PATH"
export CC="/usr/local/opt/llvm/bin/clang"
export CXX="/usr/local/opt/llvm/bin/clang++"
export CXX11="/usr/local/opt/llvm/bin/clang++"
export CXX14="/usr/local/opt/llvm/bin/clang++"
export CXX17="/usr/local/opt/llvm/bin/clang++"
export CXX1X="/usr/local/opt/llvm/bin/clang++"
export LDFLAGS="-L/usr/local/opt/llvm/lib"
export CPPFLAGS="-I/usr/local/opt/llvm/include"
5. Don’t forget to source your .bashrc / .zshrc
(e.g. relaunch your cmd) after editing it and make sure your python environment is activated.
6. Try to uninstall and install mujoco-py again.
See this GitHub issue for more information. You should also see the Troubleshooting section of the mujoco-py README.
Version
Here we bump into the first trap! The newest OpenAI Gym doesn’t work with MuJoCo 2.0, see this GitHub issue if you want to know the details. This is why you need to download MuJoCo version 1.50 binaries. Alternatively, if you really need to use MuJoCo 2.0, you can download the MuJoCo 2.0 binaries for Linux or OSX, install the newest mujoco-py, and then install the last Gym that supports MuJoCo 2.0: pip install -U gym[all]==0.15.3
Installing OpenAI Gym Environments (tutorial)
Here, it’s important to install the OpenAI Gym package with the “mujoco” and “robotics” extras or simply all extras:
- Run
pip3 install gym[mujoco,robotics]
orpip3 install gym[all]
- Check the installation by running:
python3 -c "import gym; env = gym.make('Humanoid-v2'); print('nIt is OKAY!' if env.reset() is not None else 'nSome problem here...')"
If you see “It is OKAY!” printed at the end of the cmd, then it’s OKAY! Again, you can ignore warnings like “objc[…]: Class GLFW… is implemented in both…”.
MuJoCo diagnostics
Now I’ll talk about useful metrics provided by the OpenAI Gym MuJoCo environments. They depend on an environment version, so I divide them into v2 and v3 diagnostics. You can access these metrics in an “info” dictionary provided by the environment step method: observation, reward, done, info = env.step(action). See Gym documentation for more. The table below presents keys that allow you to access the metrics in the dictionary and metrics short descriptions.
Name
|
Version
|
Key
|
Descripton
|
HalfCheetah |
v2 / v3 |
reward_run reward_ctrl |
The positive reward for the robot forward velocity. The negative reward for the robot action vector magnitude. |
HalfCheetah |
v3 |
x_position x_velocity |
Position in the X-axis. Velocity in the X-axis (forward velocity). |
Hopper |
v3 |
x_position x_velocity |
Position in the X-axis. Velocity in the X-axis (forward velocity). |
Walker2d |
v3 |
x_position x_velocity |
Position in the X-axis. Velocity in the X-axis (forward velocity). |
Ant |
v2 / v3 |
reward_forward reward_ctrl reward_contact
reward_survive |
The positive reward for the robot forward velocity. The negative reward for the robot action vector magnitude. The negative reward for the contact force magnitude between the robot and the ground. The constant positive reward at each time step when the robot is alive (until the end of an episode or the robot falls). |
Ant |
v3 |
x_position x_velocity y_position y_velocity distance_from _origin |
Position in the X-axis. Velocity in the X-axis. Position in the Y-axis. Velocity in the Y-axis. Distance from the robot starting position, (0, 0). |
Humanoid |
v2 / v3 |
reward_linvel reward_quadctrl reward_impact
reward_alive |
The positive reward for the robot forward velocity. The negative reward for the robot action vector magnitude. The negative reward for the contact force magnitude between the robot and the ground. The constant positive reward at each time step when the robot is alive (until the end of an episode or the robot falls). |
Humanoid |
v3 |
x_position x_velocity y_position y_velocity distance_from _origin |
Position in the X-axis. Velocity in the X-axis. Position in the Y-axis. Velocity in the Y-axis. Distance from the robot starting position, (0, 0). |
Table: The most useful metrics provided by the OpenAI Gym MuJoCo environments
Reward components can be especially useful, for example, a forward velocity reward – which is the goal of these tasks. However, note that the absence of some metric in the info dictionary doesn’t mean that, say, survival reward isn’t added to the rewards of Hopper or Walker —it is! For more nitty-gritty details like this, I encourage you to look into the code of the specific task on GitHub, e.g. Walker2d-v3.
Now, let’s take a look at example metric values on the Humanoid task.
Humanoid diagnostics

The figure above compares velocities of three different DRL algorithms: SAC, SOP, and SUNRISE. The velocities are plotted for fully trained agents at different points of the episode. You can see that the SOP agent runs the fastest, which is the goal of this task. In the figures below we investigate the positions of the SAC agent at the end of episodes at different stages of training.


You can see that this particular SAC agent runs in the negative X and positive Y direction and that with training it gets further and further. Because the time it has before the end of the episode remains the same, it means that it learns to run faster with training. Note that the agent isn’t trained to run in any particular direction. It’s trained to run forward as fast as possible in whatever direction. This means that different agents can learn to run in different directions. Plus, the agent can change the run direction at some point of training, which is shown in the figures below.


Conclusions
Congratulations, you’ve got MuJoCo up and running! Now you’ll be interested in training agents in these environments—check out this repository. It includes an easy-to-understand code of DRL algorithms implemented in modern TF2. This code is based on the newcomer-friendly SpinningUp codebase. Moreover, it includes the ability to log into Neptune platform, which is very convenient to store and analyze the training results! I use it in my research and you have to give it a try too.