Case Study

How Neptune Gave Waabi Organization-Wide Visibility on Experiment Data

Organic adoption by our teams has been a key indicator that the tool has added value to their workflows and they've been able to use it successfully.

Neil Isaac

Senior Staff Software Developer at Waabi

Before

No experiment tracking solution in place

Multiple teams training models and struggling with knowledge sharing

After

Unified tracking and data analysis across large-scale experiments

Improved organizational visibility and consistency in benchmarking

Waabi, founded by AI pioneer and visionary Raquel Urtasun (ex-Uber Chief Scientist), is developing the next generation of self-driving truck technology. Their goal is to develop a large-scale solution for self-driving trucks.

They do this by using deep learning, probabilistic inference, and complex optimization to create software that is end-to-end trainable, interpretable, and capable of very complex reasoning.

Waabi World and its core capabilities: World creation, camera and LiDAR sensor simulation, scenario generation and testing, and learning to drive in simulation — Waabi World and its core capabilities | Source

The challenge

Waabi’s ML teams are organized around different technical pillars. Each pillar is in charge of delivering technology for a different functional area. At a high level, AI teams at Waabi have a standard process to benchmark their research progress. They establish a fair benchmark, set up baseline models, and go through iterations of testing new ideas.

They have a unified training workflow across all projects and datasets. So all the teams constantly launch experiments for different tasks. They seek model improvements by iteratively fine-tuning them and regularly comparing results against established benchmarks. Depending on the project they work on, they can launch over ten training jobs and experiments per day.

And considering that for autonomous systems to work, they must achieve a complex understanding of the environment to decide what action to take, there are many different types of data involved. Maps, LiDAR, camera, radar, inertial, and other sensor data.

Having such a large-scale process under control quickly became a challenge. “Our ML teams at Waabi continuously run large-scale experiments with ML models. A significant challenge we faced was keeping track of the data they collected from experiments and exporting it in an organized and shareable way,” says Neil Isaac, Sr. Staff Software Developer at Waabi.

Visibility across the organization in large-scale experimentation workflow

It was especially important for them to be able to share this work across the entire organization, as visibility and consistency are fundamental for the company.

“We identified the lack of tooling as soon as we started planning and building consistent benchmark datasets. We considered our workflow and recognized that sharing benchmark results in a constant place and format and retaining data for later comparison after the end of a project was critical,” says Neil.

The company evaluated open-source solutions and well-known vendor products. They were looking for feature-rich experiment tracking, collaboration capabilities, and high-quality documentation. “Neptune was the best choice for our use cases,” says James Tu, Research Scientist at Waabi.

During the evaluation phase, Neil worked closely with Neptune’s Sales and Product teams. The goal was to find the best way to leverage the tool for Waabi’s use case and collaborate on the roadmap.

I would definitely recommend the product. The Neptune team made it easy to test and adopt the tool via a self-initiated trial but also took the time to make personal connections with our whole team and understand their needs.

Neil Isaac Senior Staff Software Developer at Waabi

Now Waabi has one place to organize all the results. Everyone can have access to this workspace so they can share experiments within and across different teams. Collaboration and knowledge transfer is much easier.

“The product has been very helpful for our experimentation workflows. Almost all the projects in our company are now using Neptune for experiment tracking, and it seems to satisfy all our current needs. It’s also great that all these experiments are available to view for everyone in the organization, making it very easy to reference experimental runs and share results,” says James.

Not only has the visibility improved, but the teams at Waabi are also way more productive. Neptune has made it easier to keep track of the experiments they are running and reduced the amount of overhead spent on the organization.

As Neil says, “Organic adoption by our teams has been a key indicator that the tool has added value to their workflows and that they have been able to use it successfully.”

Monitoring compute resources in Reinforcement Learning workflow

Some models the teams at Waabi run are more data-heavy than others. The front end of an autonomy system such as Perception has very large inputs. They would use a lot of different sensors and, in real-time, stream a lot of data into those models deployed on the vehicle.

For offline perception tasks like data augmentation, the system would use large models, which require a lot of training resources and more distributed jobs, more GPUs per worker, and more workers. Simpler automation jobs may be able to train on a single machine or a single GPU.

On top of that, their workloads are very scalable because they will often start development on a developer’s machine to ensure the code works but then very quickly switch to running that on the cloud and scale it up as the data set grows.

In all those cases, the team would struggle without the proper resource monitoring that they have in Neptune.

“Our team leverages simulators quite heavily, and we use a lot of computers. One thing we’re always keeping track of is what the utilization is and how to improve it,” says James. “Sometimes, we’ll get, for example, out-of-memory errors, and then seeing how the memory increases over time in the experiment is really helpful for debugging as well.”

Another capability that complements the resource monitoring well is the remote stop function. Whenever something’s off, they don’t need to kill a cloud training job, they can just stop it from the Neptune UI without navigating cloud infrastructure-oriented dashboards.

As Neil says, it’s always essential for Waabi to optimize training runtime. “It directly affects cost as well as, more importantly, productivity. The faster we train models, the sooner we get results. That’s incredibly important to us.”