How Neptune Helped Zoined Scale Up to 100s of Runs Without Slowing Down
Zoined provides a cloud-based analytics service for retail and hospitality, utilizing machine learning to analyze sales and operational data, offering real-time, actionable insights through visual dashboards tailored to various business needs.

The challenge
Zoined faced significant challenges in managing and tracking a growing number of experiments, particularly with limited staff.
Kha Nguyen, Senior Data Scientist at Zoined, responsible for the forecasting pipeline, struggled with manually logging and visualizing data from extensive experiments and intermediate results, which hampered efficiency and reproducibility. The process became increasingly cumbersome as data frames expanded and the demand for real-time result assessment intensified.
It became clear that Zoined needed a good experiment-tracking solution. At first, they experimented with Splunk and MLflow but faced significant hurdles.
Splunk was initially intimidating and complex to set up for visualizing data, becoming prohibitively expensive as experiment volume increased. MLflow, while a potential solution, proved challenging in terms of reliability and management, particularly the self-hosted option, which was too cumbersome for an individual to maintain effectively.
Switching to a scalable managed solution
Faced with the challenges of managing a self-hosted MLflow solution, Kha, decided to explore more sustainable alternatives that would allow him to focus on his core role rather than infrastructure management. The self-hosted approach proved to be too resource-intensive, often resulting in crashes and sluggish performance when handling large data sets.
Neptune offered a fully managed service that simplified the operational aspects of experiment tracking. It eliminated the need for manual scaling and constant monitoring of infrastructure, which were significant pain points with MLflow.
This switch not only freed Kha from the technical burdens of maintaining a logging system but also ensured that Neptune could scale dynamically to accommodate hundreds of experiments and large datasets without performance degradation.
Highly responsive charts and visualizations
With MLflow, Kha had difficulties visualizing experiment results effectively. The system became unresponsive, especially when dealing with large files or complex charts.
Neptune addressed these issues by offering cleaner, more responsive charts and visualizations, which were crucial for Kha to analyze data efficiently and make informed decisions quickly. Neptune’s powerful visualization tools allowed for immediate access to well-rendered charts and files, enhancing the user experience and boosting productivity.
Or when I log a CSV file that’s about 10,000 rows, MLflow just stops working. I click on the CSV file, it may take maybe three minutes before it shows up, and even when it starts, it doesn’t work smoothly anymore. It’s totally unusable but that’s not a problem with Neptune.
Unexpected win: ability to monitor hardware metrics
Before switching to Neptune, Kha had no easy way to monitor hardware performance metrics, which are crucial for optimizing resource usage during experiments. MLflow lacked the capability to log such metrics effectively, leaving Kha without insights into how much computing power his experiments were consuming.
Neptune offers built-in support for logging hardware metrics, such as CPU and memory usage, which Kha found immensely beneficial. This feature enabled Kha to adjust resource allocation proactively, optimize costs, and ensure that the experiments were running efficiently on their cloud infrastructure.
Results
- Significantly reduced operational overhead by transitioning from cumbersome self-hosting to Neptune’s scalable managed solution.
- Enabled real-time, responsive charts, enhancing analytical clarity and speed.
- Integrated hardware metrics tracking into the process and optimized resource utilization.
Thanks to Kha Nguyen for his help in creating this case study!