Neptune Blog

Installing MuJoCo to Work With OpenAI Gym Environments

Piotr Januszewski

5 min

23rd July, 2024

Reinforcement Learning

In this article, I’ll show you how to install MuJoCo on your Mac/Linux machine in order to run continuous control environments from OpenAI’s Gym. These environments include classic ones like HalfCheetah, Hopper, Walker, Ant, and Humanoid and harder ones like object manipulation with a robotic arm or robotic hand dexterity. I’ll also discuss additional agent diagnostics provided by the environments that you might not have considered before.

How do you get MuJoCo?

You might wonder, what’s so special about installing MuJoCo that it needs a guide? Well, getting a license and properly installing it might be relatively easy, but the big problems start when you’re matching MuJoCo and OpenAI Gym versions, and installing the mujoco-py package. It took me many hours to get it right the first time I tried!

To save you the trouble, I’ll walk you through the installation process step by step. Then I’ll discuss some useful diagnostics to keep an eye on, we’ll take a look at example diagnostics from Humanoid training. Finally, I’ll link the code that lets you train agents on MuJoCo tasks and watch the diagnostics using neptune.ai. To start, I’ll give you a bit of context about MuJoCo and OpenAI Gym environments.

Editor’s note

Do you feel like experimenting with neptune.ai?

Request a free trial
Play with a live project
See the docs or watch a short product demo (2 min)

MuJoCo – Multi-Joint dynamics with Contact

MuJoCo demo — *Source:* *MuJoCo Plugin and Unity Integration*

MuJoCo is a fast and accurate physics simulation engine aimed at research and development in robotics, biomechanics, graphics, and animation. It’s an engine, meaning, it doesn’t provide ready-to-use models or environments to work with, rather it runs environments (like those that OpenAI’s Gym offers).

What is OpenAI Gym?

OpenAI Gym (or Gym for short) is a collection of environments. Some of them called continuous control in general, run on the MuJoCo engine. All the environments share two important characteristics:

An agent observes vectors that describe the kinematic properties of the controlled robot. This means that the state space is continuous.
Agent actions are vectors too and they specify torques to be applied on the robot joints. This means that the action space is also continuous

Gym MuJoCo environments include classic continuous control, objects manipulation with a robotic arm, and robotic hand (Shadow Hand) dexterity. There are multiple tasks available for training in these environments. Some of them are presented in the figures below. You can find details about all of them in the Gym environments list. This post is especially useful for robo-arm and robo-hand environments. If you don’t know the Gym API yet, I encourage you to read the documentation – the two short sections “Environments” and “Observations” should be enough to start.

Classic continuous control – tasks from left to right: Walker2d, And, and Humanoid.
Source: OpenAI Roboschool

Robotic arm — Objects manipulation with a robotic arm – the pick and place task.
Source: *Overcoming exploration in RL from demos*

Shadow Hand dexterity – the hand manipulate block task.
Source: OpenAI Gym Robotics

Installing MuJoCo and OpenAI Gym

In this section, I’ll show you where to get the MuJoCo license, how to install everything required, and also how to troubleshoot a common macOS problem.

License

You can get a 30-day free trial on the MuJoCo website or—if you’re a student—a free 1-year license for education. The license key will arrive in an email with your username and password. If you’re not a student, you might try to encourage the institution you work with to buy a license.

Installing mujoco-py

Here are step-by-step instructions, and below I added some explanations and troubleshooting tips:

Download the MuJoCo version 1.50 binaries for Linux or macOS.
Unzip the downloaded mjpro150 directory into ~/.mujoco/mjpro150, and place your license key (the mjkey.txt file from your email) at ~/.mujoco/mjkey.txt.
Run pip3 install -U 'mujoco-py<1.50.2,>=1.50.1'
Run python3 -c 'import mujoco_py'

If you see warnings like objc[…]: Class GLFW… is implemented in both…, then ignore them. If you’re on macOS and see clang: error: unsupported option ‘-fopenmp’or any other compilation-related error, then go to the Troubleshooting subsection. If you wonder why MuJoCo 1.5, then go to the Version subsection. If you have no more concerns, then you can jump into Gym installation!

Troubleshooting

If, on macOS, clang: error: unsupported option ‘-fopenmp’ error or any other error related to a compiler (e.g. gcc if you have one installed) happened to you during the installation or running python3 -c ‘import mujoco_py’ then follow these steps:

1. Install brew if you don’t have it already.

2. Uninstall all other compilers if you have some, e.g. run brew uninstall gcc. You may need to run it a couple of times if you have more than one version.

3. Run brew install llvm boost hdf5

4. Add this to your .bashrc / .zshrc

export PATH="/usr/local/opt/llvm/bin:$PATH"
export CC="/usr/local/opt/llvm/bin/clang"
export CXX="/usr/local/opt/llvm/bin/clang++"
export CXX11="/usr/local/opt/llvm/bin/clang++"
export CXX14="/usr/local/opt/llvm/bin/clang++"
export CXX17="/usr/local/opt/llvm/bin/clang++"
export CXX1X="/usr/local/opt/llvm/bin/clang++"
export LDFLAGS="-L/usr/local/opt/llvm/lib"
export CPPFLAGS="-I/usr/local/opt/llvm/include"

5. Don’t forget to source your .bashrc / .zshrc (e.g. relaunch your cmd) after editing it and make sure your python environment is activated.

6. Try to uninstall and install mujoco-py again.

See this GitHub issue for more information. You should also see the Troubleshooting section of the mujoco-py README.

Version

Here we bump into the first trap! The newest OpenAI Gym doesn’t work with MuJoCo 2.0, see this GitHub issue if you want to know the details. This is why you need to download MuJoCo version 1.50 binaries. Alternatively, if you really need to use MuJoCo 2.0, you can download the MuJoCo 2.0 binaries for Linux or OSX, install the newest mujoco-py, and then install the last Gym that supports MuJoCo 2.0: pip install -U gym[all]==0.15.3

Installing OpenAI Gym Environments (tutorial)

Here, it’s important to install the OpenAI Gym package with the “mujoco” and “robotics” extras or simply all extras:

Run pip3 install gym[mujoco,robotics] or pip3 install gym[all]
Check the installation by running:

python3 -c "import gym; env = gym.make('Humanoid-v2'); print('nIt is OKAY!' if env.reset() is not None else 'nSome problem here...')"

If you see “It is OKAY!” printed at the end of the cmd, then it’s OKAY! Again, you can ignore warnings like objc[…]: Class GLFW… is implemented in both….

MuJoCo diagnostics

Now I’ll talk about useful metrics provided by the OpenAI Gym MuJoCo environments. They depend on an environment version, so I divide them into v2 and v3 diagnostics. You can access these metrics in an “info” dictionary provided by the environment step method: observation, reward, done, info = env.step(action). See Gym documentation for more. The table below presents keys that allow you to access the metrics in the dictionary and metrics short descriptions.

Name	Version	Key	Descripton
HalfCheetah	v2 / v3	reward_run reward_ctrl	The positive reward for the robot forward velocity. The negative reward for the robot action vector magnitude.
HalfCheetah	v3	x_position x_velocity	Position in the X-axis. Velocity in the X-axis (forward velocity).
Hopper	v3	x_position x_velocity	Position in the X-axis. Velocity in the X-axis (forward velocity).
Walker2d	v3	x_position x_velocity	Position in the X-axis. Velocity in the X-axis (forward velocity).
Ant	v2 / v3	reward_forward reward_ctrl reward_contact reward_survive	The positive reward for the robot forward velocity. The negative reward for the robot action vector magnitude. The negative reward for the contact force magnitude between the robot and the ground. The constant positive reward at each time step when the robot is alive (until the end of an episode or the robot falls).
Ant	v3	x_position x_velocity y_position y_velocity distance_from _origin	Position in the X-axis. Velocity in the X-axis. Position in the Y-axis. Velocity in the Y-axis. Distance from the robot starting position, (0, 0).
Humanoid	v2 / v3	reward_linvel reward_quadctrl reward_impact reward_alive	The positive reward for the robot forward velocity. The negative reward for the robot action vector magnitude. The negative reward for the contact force magnitude between the robot and the ground. The constant positive reward at each time step when the robot is alive (until the end of an episode or the robot falls).
Humanoid	v3	x_position x_velocity y_position y_velocity distance_from _origin	Position in the X-axis. Velocity in the X-axis. Position in the Y-axis. Velocity in the Y-axis. Distance from the robot starting position, (0, 0).

Table: The most useful metrics provided by the OpenAI Gym MuJoCo environments

Reward components can be especially useful, for example, a forward velocity reward – which is the goal of these tasks. However, note that the absence of some metric in the info dictionary doesn’t mean that, say, survival reward isn’t added to the rewards of Hopper or Walker —it is! For more nitty-gritty details like this, I encourage you to look into the code of the specific task on GitHub, e.g. Walker2d-v3.

Now, let’s take a look at example metric values on the Humanoid task.

Humanoid diagnostics

Humanoid velocity — *Comparison of velocities of three different DRL algorithms: SAC, SOP, and SUNR*

The figure above compares velocities of three different DRL algorithms: SAC, SOP, and SUNRISE. The velocities are plotted for fully trained agents at different points of the episode. You can see that the SOP agent runs the fastest, which is the goal of this task. In the figures below we investigate the positions of the SAC agent at the end of episodes at different stages of training.

MuJoCo logs_AverageXPosition-2 — *SAC final positions in the X-axis across training on the Humanoid task. |* *See in the Neptune ap*p

MuJoCo logs_AverageYPosition-2 — SAC final positions in the Y-axis across training on the Humanoid task | See in the Neptune app

You can see that this particular SAC agent runs in the negative X and positive Y direction and that with training it gets further and further. Because the time it has before the end of the episode remains the same, it means that it learns to run faster with training. Note that the agent isn’t trained to run in any particular direction. It’s trained to run forward as fast as possible in whatever direction. This means that different agents can learn to run in different directions. Plus, the agent can change the run direction at some point of training, which is shown in the figures below.

MuCoJo logs_AverageXPosition-3 — *SAC final positions in the X-axis across training on the Humanoid task. It changes the run direction in one-third of training | See in the Neptune app*

MuJoCo logs_AverageYPosition-3 — *SAC final positions in the Y-axis across training on the Humanoid task. It changes the run direction late in the training | See in the Neptune app*

Conclusions

Congratulations, you’ve got MuJoCo up and running! Now you’ll be interested in training agents in these environments—check out this repository. It includes an easy-to-understand code of DRL algorithms implemented in modern TF2. This code is based on the newcomer-friendly SpinningUp codebase. Moreover, it includes the ability to log into Neptune platform, which is very convenient to store and analyze the training results! I use it in my research and you have to give it a try too.

Was the article useful?

More about Installing MuJoCo to Work With OpenAI Gym Environments

Check out our product resources and related articles below:

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Product resource

How Elevatus Can Now Find Any Information About a Model in a Minute

Zero-Shot and Few-Shot Learning with LLMs

Customizing LLM Output: Post-Processing Techniques

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs

Transition Hub

Train FM

State of Foundation Model Training Report 2025