How Paretos Tracks Thousands of Experiments Across Dozens of Projects With Neptune
Paretos is a decision-intelligence platform helping businesses make smarter, data-driven decisions through customized forecasting models. Their ML team builds and retrains models for each customer individually, using time series data to power predictions in areas like demand planning and capacity optimization. These models are trained on client-specific data, evaluated against customer-defined success metrics, and regularly updated as new data arrives.

The challenge
The team runs a dedicated training pipeline for each customer. Experiments are isolated by use case and tracked in staging and production environments, generating a high volume of runs and metadata across many active projects.
Before adopting Neptune, Paretos was using Weights & Biases to track experiments. As their ML operations matured, the team began running into friction. The pricing model became a trigger to explore alternatives, and they wanted something that would better align with how they work: running many experiments across many isolated projects, with a need to scale users and runs without bottlenecks.
Since their usage focused on core experiment tracking: monitoring training, logging metrics, comparing results, all of that was easy to replicate in Neptune. The team needed a tool that could offer the same functionality, just in a more flexible and scalable structure.
The solution: Scalable tracking for high-volume ML
Paretos migrated to Neptune with a clean slate, as no historical data was brought over. The switch was straightforward, and the integration took less than a week.
When using Neptune, they set up a project structure that mirrors their product setup: one or more projects per customer, typically split into staging and production. Every time a training pipeline runs, it logs training and validation metrics like RMSE and MSE to Neptune, along with metadata like parameter configs, outputs, or dataset snapshots. The number of pipelines they run varies between 50 and over 2000.
Once runs are logged, the team uses tags and groupings to organize them by purpose (e.g., feature experiments, parameter sweeps, baseline comparisons). They use sorting and filtering to surface top-performing runs and compare them across key metrics. From there, they select a small set of candidates for deeper analysis in downstream tools.
Neptune serves as the central record for all experiments. It’s lightweight enough to integrate with their existing pipelines, flexible enough to accommodate dozens of projects, and reliable enough to stay out of the way.
The results
With Neptune, the Paretos team has been able to:
- Track hundreds to 2,000+ runs per use case;
- Structure experiments across staging and production with multiple Neptune projects per customer;
- Use filtering and grouping to narrow down the best-performing models quickly,
- Gain a stable tracking layer used across the ML team.
Thanks to Robert Haase for helping create this case study!