📣 BIG NEWS: Neptune is joining OpenAI! → Read the message from our CEO 📣

Case Study

How Neptune Helped Zoined Scale Up to 100s of Runs Without Slowing Down

The more I used Neptune, the more I felt that I would rather pay for a hosted solution than have to maintain the infrastructure myself.
Kha Nguyen
Senior Data Scientist at Zoined
Before
    Wasted time on operational tasks (maintaining open-source solution)
    Struggled with irresponsive dashboards and visualizations
After
    Have scalable, managed tracking tool that doesn't distract the team from building models
    Have real-time, responsive charts, enhancing analytical clarity and speed

Zoined provides a cloud-based analytics service for retail and hospitality, utilizing machine learning to analyze sales and operational data, offering real-time, actionable insights through visual dashboards tailored to various business needs.

Zoined dashboard
Zoined dashboard | Source: Zoined

The challenge

Zoined faced significant challenges in managing and tracking a growing number of experiments, particularly with limited staff.

Kha Nguyen, Senior Data Scientist at Zoined, responsible for the forecasting pipeline, struggled with manually logging and visualizing data from extensive experiments and intermediate results, which hampered efficiency and reproducibility. The process became increasingly cumbersome as data frames expanded and the demand for real-time result assessment intensified.

It became clear that Zoined needed a good experiment-tracking solution. At first, they experimented with Splunk and MLflow but faced significant hurdles.

Splunk was initially intimidating and complex to set up for visualizing data, becoming prohibitively expensive as experiment volume increased. MLflow, while a potential solution, proved challenging in terms of reliability and management, particularly the self-hosted option, which was too cumbersome for an individual to maintain effectively.

avatar lazyload
quote
When I joined this company, we were doing quite many different experiments and it’s really hard to keep track of them all so I needed something to just view the result or sometimes or also it’s intermediate results of some experiments like what [does] the data frame look like? What [does] the CSV look like? Is it reasonable? Is there something that went wrong between the process that resulted in an undesirable result? So we were doing it manually first but just writing… some log value to some log server like a Splunk.
Kha Nguyen Senior Data Scientist at Zoined

Switching to a scalable managed solution

Faced with the challenges of managing a self-hosted MLflow solution, Kha, decided to explore more sustainable alternatives that would allow him to focus on his core role rather than infrastructure management. The self-hosted approach proved to be too resource-intensive, often resulting in crashes and sluggish performance when handling large data sets.

Neptune offered a fully managed service that simplified the operational aspects of experiment tracking. It eliminated the need for manual scaling and constant monitoring of infrastructure, which were significant pain points with MLflow.

This switch not only freed Kha from the technical burdens of maintaining a logging system but also ensured that Neptune could scale dynamically to accommodate hundreds of experiments and large datasets without performance degradation.

avatar lazyload
quote
I started using Neptune, and then the more I used it, the more I felt like “okay, I would rather pay than maintain hold of this infrastructure myself.
Kha Nguyen Senior Data Scientist at Zoined

Highly responsive charts and visualizations

With MLflow, Kha had difficulties visualizing experiment results effectively. The system became unresponsive, especially when dealing with large files or complex charts.

Neptune addressed these issues by offering cleaner, more responsive charts and visualizations, which were crucial for Kha to analyze data efficiently and make informed decisions quickly. Neptune’s powerful visualization tools allowed for immediate access to well-rendered charts and files, enhancing the user experience and boosting productivity.

avatar lazyload
quote
The real headache came when we ran like 100 experiments, 100 forecasts at the same time and all of that started streaming data into MLflow. That’s when we see MLflow is not responding.

Or when I log a CSV file that’s about 10,000 rows, MLflow just stops working. I click on the CSV file, it may take maybe three minutes before it shows up, and even when it starts, it doesn’t work smoothly anymore. It’s totally unusable but that’s not a problem with Neptune.
Kha Nguyen Senior Data Scientist at Zoined

Unexpected win: ability to monitor hardware metrics

Before switching to Neptune, Kha had no easy way to monitor hardware performance metrics, which are crucial for optimizing resource usage during experiments. MLflow lacked the capability to log such metrics effectively, leaving Kha without insights into how much computing power his experiments were consuming.

Neptune offers built-in support for logging hardware metrics, such as CPU and memory usage, which Kha found immensely beneficial. This feature enabled Kha to adjust resource allocation proactively, optimize costs, and ensure that the experiments were running efficiently on their cloud infrastructure.

avatar lazyload
quote
I didn’t think about logging something like CPU metrics or memory metrics and it turned out to be pretty important. We can gauge how much resources we need. When I look at it, I can see that we are using too much RAM, or do we not have enough or we need to use more CPU, for example.
Kha Nguyen Senior Data Scientist at Zoined

Results

  • Significantly reduced operational overhead by transitioning from cumbersome self-hosting to Neptune’s scalable managed solution.
  • Enabled real-time, responsive charts, enhancing analytical clarity and speed.
  • Integrated hardware metrics tracking into the process and optimized resource utilization.

Thanks to Kha Nguyen for his help in creating this case study!

avatar
quote
I felt like “why do I have to do all these manually?” And then I came across Neptune and it seems like, okay, this is a managed solution and it seems to be equivalent to MLflow.
Kha Nguyen Senior Data Scientist at Zoined

Want your team to focus on experiments instead of maintaining the infrastructure?