Zoined offers Retail and Hospitality Analytics as a cloud-based service for different roles from top management to manager level. The service collects sales data from stores and venues including inventories, time and attendance, and visitor tracking systems as well as webstores. The data is analyzed and presented in a very accessible, visual format for business owners so they can get real-time, actionable insights for their business and select the preferred time frames they want to report on. The product also allows businesses to filter and group their data easily and create custom views, grasp trends quickly with charts and graphs.
With Zoined® businesses have access to a fully-managed, off-the-shelf solution with ready-made dashboards and analytics for retail and wholesale, especially fashion, food retail, coffee shops’, and restaurants’ needs.
Running lots of experiments, especially in a start-up with few scientists and engineers solving problems, can get daunting. Tracking the experiments, versioning the datasets as they inevitably get larger, and generally taking procedures to get reproducible results can be very tricky to navigate. This was the problem Kha faced when he first joined Zoined.
In addition, he was the only one responsible for the forecasting pipeline in Zoined which made experiment tracking more tedious to conduct manually.
Kha was also working with large data frames with forecasts (predictions) that needed to be logged alongside their experiments, as well as find a way to visualize results for complete experiments and intermediate experiments so he can be more efficient during the experimentation process.
Problems with Splunk for experiment tracking
The first solution the team tried was to manually log experiment values to Splunk. For one, beginning with such a tool can be intimidating to get started with.
Another problem is that visualizing logged (experiment) values is quite difficult to do and might require some expert help to set up.
Finally, Splunk can get expensive pretty fast — especially for a company that runs a lot of experiments and will need to send a large volume of data to the log server.
Problems with maintaining MLFlow
Reliability and speed of MLFlow
The next solution Kha tried was MLflow. One of the issues he had with using MLflow was the hosting options available. As he mentioned, the hosted MLflow solution is Databricks. He started using a self-hosted MLflow solution but it quickly became difficult to manage for an individual.
As he found out, using MLflow can be compute-intensive, consuming a lot of RAM and running quite slowly.
Hosting MLflow on a local server also comes with the problem of auto scaling for Kha. In most cases, MLflow couldn’t handle a large stream of logs either crashing or the UI stops responding, slowing down his experimentation workflow. As he mentions:
To get MLflow to work for the stream of logs, he had to scale up the number of instances which became a complex operations work to handle.
MLflow would have in fact been a great tool for Kha to manage 100s of experiments if only it didn’t have some of the issues we listed above.
Problems with collaboration on MLflow
Collaboration with a self-hosted MLflow solution was a problem for Kha because sharing experiments was difficult to do as he needed to create URL aliases for logs, especially if he wanted to share them with other collaborators. As he mentioned:
Kha needed a solution like MLflow but without the hassle that the self-hosted MLflow solution brought about. He needed a solution that:
- Was completely managed,
- Didn’t take too long to set up and get started with,
- Could elastically scale to large volumes of experiments logs and forecast dataset,
- Was also completely automated and fast,
- Could be customized and integrated with existing technologies.
Kha decided to do some digging and came across Neptune, which met all the requirements he needed.
Kha decided to choose Neptune as Zoined’s solution for logging experiment metadata because:
- 1 It is fully managed, fast, and scalable
- 2 It is a better price to value ratio and accessible
- 3 It has better charts and visualizations of his experiments
- 4 It can visualize all types of data regardless of the size and structure
- 5 It has automated logging of hardware performance metrics
“I started using Neptune, and then the more I used it, the more I felt like “okay, I would rather pay than maintain hold of this infrastructure myself.” – Kha Nguyen, Senior Data Scientist at Zoined
As Kha learnt while he used MLflow, a fully-managed infrastructure is the best bet in improving his experimentation process because it frees him up from worrying about infrastructure and operations workloads (which are not technically his core strengths) to focus on how he could improve his experiments.
Compared to MLflow, Neptune automatically scales to handle the artifacts and metadata that are logged for the 100s of experiments he runs for each of the company’s clients. The MLflow application will often crash whenever he tries to log a CSV file with more than 10,000 rows, halting his work and, ultimately, his productivity.
As he explained:
“In MLflow, when I log a CSV file that’s about 10,000 rows, MLflow just stops working. I click on the CSV file, it may take maybe three minutes before it shows up, and even when it starts, it doesn’t work smoothly anymore. It’s totally unusable but that’s not a problem with Neptune.” – Kha Nguyen, Senior Data Scientist at Zoined
“The more I used Neptune, the more I felt that I would rather pay for a hosted solution than have to maintain the infrastructure myself… I talked to Salsa (Zoined’s CEO) who asked about the pricing, and I said 50 dollars per month and that’s how we got in.” – Kha Nguyen, Senior Data Scientist at Zoined
As he found out, Neptune is a great alternative to previous solutions. For individuals, it costs nothing to use Neptune for work, research, and personal projects. For teams, pricing starts from $49 per entire team and they will only pay extra when they go over the free usage quota. Getting up and running with Neptune wasn’t a complicated process for Kha.
“It (Neptune) has much nicer visualizations or charts because sometimes when I wanna log some kind of chart or graph, MLflow can do that, but it will become really slow to open a chart.” – Kha Nguyen, Senior Data Scientist at Zoined
One of the well-known features of Neptune is the ability to customize charts and use automated visualization features that can save users a lot of time. For Kha, Neptune has much nicer and responsive visualization features for experiments and other metrics compared to MLflow.
With Neptune, Kha found that he could visualize data with Pandas data frames as he normally would in his workspace, he could also log large volumes of data being streamed for his experiment and everything still works smoothly.
Neptune scales to handle large volumes of data, fully managed, where Kha only has to worry about his experiments and not the underlying logging server. He also found Neptune’s ability to log data frames directly to the platform very useful.
One of the unique features that Neptune provides compared to MLflow is the option for users to log hardware metrics to provide an insightful look into how their experiments are doing and how much resources they are taking up. Kha particularly finds this feature useful so he can use the insights to improve his experiments in terms of resource usage.
As Kha explains:
“You also have automatic computing resource monitoring where you can start monitoring CPU and memory usage out-of-the-box. I think that’s cool so that we can gauge how much risk resources that we need to do. When I look at it, I can see that we are using too much RAM, or do we not have enough?. Do we need to use more CPU, for example?” – Kha Nguyen, Senior Data Scientist at Zoined
After a few months of using Neptune, how has it improved Kha’s experimentation workflow?
Overall, Neptune was able to meet the requirements of Kha, who is the individual Data Scientist on his team. It proved to be a useful solution because:
Having struggled earlier with the MLflow solution, a fully-managed solution allowed Kha to focus more on improving his experiments rather than configuring and maintaining infrastructure for logging his experiments regardless of the scale.
“I can pretty much log everything in Neptune and more…” – Kha Nguyen, Senior Data Scientist at Zoined
Neptune provides Kha with the option of customizing what he can log and also includes out-of-the-box options for metadata he can log. The option of logging large volumes of data also helps improve Kha’s experimentation workflow making it easy to have all his experiment optimization tools in a central place.
“I didn’t think about logging something like CPU metrics or memory metrics and it turned out to be pretty important when debugging something running in parallel with big data, for example. I didn’t think about that when I was using MLflow, so this is something that I find extremely helpful.” – Kha Nguyen, Senior Data Scientist at Zoined
Neptune’s hardware performance monitoring feature helped Kha to estimate the memory usage for his experiments and optimize accordingly, saving him money on the jobs he runs on Amazon Web Services.
“The more I used Neptune, the more I felt that I would rather pay for a hosted solution than have to maintain the infrastructure myself.” – Kha Nguyen, Senior Data Scientist at Zoined
Kha found out that Neptune is a more economical option compared to other solutions because not only did it cost him less compared to the time he was maintaining MLflow, the fully-managed solution reduced the bill he spent on hosting a logging software on Zoined’s infrastructure.
For Kha, Neptune proved to be a better alternative to MLflow not just economically but also in terms of his productivity in running numerous experiments.
“For now, I’m not using MLflow anymore ever since I switched to Neptune because I feel like Neptune is a superset of what MLflow has to offer.” – Kha Nguyen, Senior Data Scientist at Zoined
Thanks to Kha Nguyen for his help in creating this case study!