Just released: 2025 State of Foundation Model Training report 📕 →  Read now

Case Study

How Paretos Tracks Thousands of Experiments Across Dozens of Projects With Neptune

Neptune is an important component of our system and plays a key role in internal tracking. If it were unavailable, we’d lose a valuable part of our monitoring workflow—but so far, it’s proven reliable. It just works.
Robert Haase
Lead AI Scientist at Paretos
Before
    Pricing model created friction
    High experiment volume became hard to manage
    Limited flexibility in organizing runs
After
    Scales to 2000+ runs per use case
    Clear structure across projects and stages
    Fast filtering to find top models

Paretos is a decision-intelligence platform helping businesses make smarter, data-driven decisions through customized forecasting models. Their ML team builds and retrains models for each customer individually, using time series data to power predictions in areas like demand planning and capacity optimization. These models are trained on client-specific data, evaluated against customer-defined success metrics, and regularly updated as new data arrives.

Paretos decision-intelligence software | Source: paretos.com

The challenge

The team runs a dedicated training pipeline for each customer. Experiments are isolated by use case and tracked in staging and production environments, generating a high volume of runs and metadata across many active projects.

Before adopting Neptune, Paretos was using Weights & Biases to track experiments. As their ML operations matured, the team began running into friction. The pricing model became a trigger to explore alternatives, and they wanted something that would better align with how they work: running many experiments across many isolated projects, with a need to scale users and runs without bottlenecks.

Since their usage focused on core experiment tracking: monitoring training, logging metrics, comparing results, all of that was easy to replicate in Neptune. The team needed a tool that could offer the same functionality, just in a more flexible and scalable structure. 

The solution: Scalable tracking for high-volume ML

Paretos migrated to Neptune with a clean slate, as no historical data was brought over. The switch was straightforward, and the integration took less than a week.

When using Neptune, they set up a project structure that mirrors their product setup: one or more projects per customer, typically split into staging and production. Every time a training pipeline runs, it logs training and validation metrics like RMSE and MSE to Neptune, along with metadata like parameter configs, outputs, or dataset snapshots. The number of pipelines they run varies between 50 and over 2000.

Once runs are logged, the team uses tags and groupings to organize them by purpose (e.g., feature experiments, parameter sweeps, baseline comparisons). They use sorting and filtering to surface top-performing runs and compare them across key metrics. From there, they select a small set of candidates for deeper analysis in downstream tools.

Neptune serves as the central record for all experiments. It’s lightweight enough to integrate with their existing pipelines, flexible enough to accommodate dozens of projects, and reliable enough to stay out of the way.

avatar lazyload
quote
We track hundreds of runs, then filter and sort by the most important metrics and narrow down the best three to five. Neptune is where we go to make that first cut.
Robert Haase Lead AI Scientist at Paretos

The results

With Neptune, the Paretos team has been able to:

  • Track hundreds to 2,000+ runs per use case;
  • Structure experiments across staging and production with multiple Neptune projects per customer;
  • Use filtering and grouping to narrow down the best-performing models quickly, 
  • Gain a stable tracking layer used across the ML team.
avatar lazyload
quote
We’ve been using Neptune for over two years now, and it’s become a stable part of our setup. It’s an important component of our system and plays a key role in internal tracking. If it were unavailable, we’d lose a valuable part of our monitoring workflow—but so far, it’s proven reliable. It just works. Internally, no one brings it up, which we take as a very good sign. It does its job reliably, and that’s exactly what we want from a tool like this.
Robert Haase Lead AI Scientist at Paretos

Thanks to Robert Haase for helping create this case study!

avatar
quote
We track hundreds of runs, then filter and sort by the most important metrics and narrow down the best three to five. Neptune is where we go to make that first cut.
Robert Haase Lead AI Scientist at Paretos

Looking for an experiment tracker that can handle the large scale of your model training?