Case Study

At a certain stage of machine learning maturity the need for a tool like this one rises naturally. And then Neptune is a solid choice because of low entry threshold, many useful features, and good documentation and support.
Patryk Miziuła
Senior Data Scientist at
    No experiment tracking solution in place
    No way to track, visualize and compare models
    The team has an out-of-the-box solution that can handle tracking and analyzing of over 120k experiments is an AI-focused software services company, delivering ML-based end-to-end solutions to companies in retail, manufacturing, financial, and other sectors. They have years of experience when it comes to supporting enterprises in building AI capabilities.

We spoke to Patryk Miziuła who was leading an interesting project that delivered for a leading Central and Eastern European (CEE) food company.

The task was to use ML to analyze the impact of promotional campaigns on sales increase.

Sounds fairly basic? Wait until you understand the problem statement in detail.

When it comes to Artificial Intelligence and Machine Learning, the department is focused on meta-learning research with standard ML frameworks such as TensorFlow and PyTorch.

What is the project about?

As Patryk walked us through the problem, in brief, the project was to analyze any promotional campaigns running on food items like juices, jams, and pickles for their sales effectiveness. 

To elaborate, the food company has the following supply chain structure:

Supply chain structure Campaigns are run by the company – some of them are for multiple products, some of them are for a single product, but for all the contractors/clients, etc.
Supply chain structure. Campaigns are run by the company – some of them are for multiple products, some of them are for a single product, but for all the contractors/clients, etc.

The food company runs all the campaigns of a sort “x% discount” for different products like jams, juices. Some of the campaigns are dedicated to the main contractors while some to contractors’ clients. There can be instances where campaigns are dedicated to the consumer, e.g. “Buy 3 pay for 2”.

They want to create a model that predicts the number of sales per day for a given product that is on a promotional campaign thus analyzing the impact of that campaign on the sale of that product.

So you have features related to:

  • 1 Client: size of the client in terms of capacity of stores, revenue, etc, number of locations, number of contractors
  • 2 Contractor
  • 3 Product: price, type of product
  • 4 Promotion/campaign: tv ads vs online ads vs influencer, the amount paid for that promotion, etc.

And you collate it all to create a row, with one row per day including the sales number for that particular day.

Patryk Miziuła who is a Senior Data Scientist at was pretty confident that his ML team with expertise in handling such non-trivial cases will be able to find a way to solve the problem.


This problem is particularly non-trivial due to:

  • the complexity of the data involving a large corpus of data sources
  • hundreds of different products
  • hundreds of contractors
  • thousands of contractors’ clients
  • different promotion types are aimed at various stages of the product journey from factory to household
  • different promotion parameters for each contractor or contractor’s client 
  • various promotion periods
  • overlapping promotions
  • actions of the competition, etc.

Adding to the difficulties, it was also hard to decide whether the sale increase was caused by any of dozens of promotions applied, by the synergy between them or it took place regardless of any campaigns.

To solve such a complex problem Patryk’s team had to iterate over the logic many times to structurize the problem well. For example, promotions for different products and contractors were managed by different people and set manually so there were no “general” promotion patterns that would hold for all contractors.  

Therefore the team decided to use a separate model for each product, contractor, and sometimes clients’ type. It led them to model more than 7000 separate cases.

Now, to model 7000 sub-problems, they had to train more than 120 000 models which is a big problem in itself. You see, when a team works on an ML problem there has to be an efficient solution in place for:

  • Tracking experiments: ~120k experiments need an efficient tracking system to execute the project within stipulated time and with good results. In order to get the best set of configurations for every subproblem with promotional, sales, and client data along with hyperparameters and model configs, many experiments have to be executed per subproblem.

    So, each experiment has to be tracked to find the best configuration to make informed decisions. With no experiment tracking in place, this would quickly turn into chaos, eventually resulting in missed deadlines and huge technical delays.
  • Visualizations and dashboarding: For this many different model training runs they needed:
    – A robust solution to compare runs and experiments
    – Since every ML project is executed by a team, the solution has to be collaborative

    It becomes super tedious to look through metric plots one-by-one in a static environment as opposed to creating a dynamic dashboard comparing multiple runs on a single plot. It saves time and is much more efficient and collaborative.
  • Saving metadata: A single ML experiment generates tons of metadata including metrics (training/validation/testing), results (graphs/charts/plots, and numeric data). If you multiply it by thousands of runs you have a real metadata management problem.  

Patryk and his team at quickly realized that these issues have to be solved first, in order to move forward.

avatar lazyload
Clearly, handling the training of more than 7000 separate machine learning models without any specialized tool is practically impossible. We definitely needed a framework able to group and manage the experiments.
Patryk Miziuła Senior Data Scientist at


avatar lazyload
We needed a tool able to store and compare results of plenty of experiments, divided into subproblems. Also, simplicity of plugging the tool to the code was a criterion.
Patryk Miziuła Senior Data Scientist at

Agreeing with Patryk, using a designated tool seems like a wise choice because you need to solve the problem at hand, for Patryk, it was “analyzing the impact of promotional campaigns on sales increase” and not “how to manage 120k models efficiently”.

Fortunately, Patryk and his team members were already familiar with Neptune so the decision was prompt. Reasons for making this choice, according to Patryk were:

  • 1 Familiarity with Neptune
  • 2 The simplicity of using Neptune and the convenient API to download the runs table and interesting experiments
  • 3 Fast and accurate support

To put things in perspective, we asked Patryk what if your team went for a polar opposite solution which is to use directories and spreadsheets to store and track everything?

avatar lazyload
For each of 200 product types, we created a separate filter tree, and the depth of the filter depends on the amount of data which were available for the current level.

That is a lot of models. If there were no experiment tracker on the market, I think we would have to try to store a huge amount of different models, and related metadata and results in different directories and excel sheets.

This would be tedious and time-consuming already. What would add to the misery, is that we needed to change the feature generating filter tree due to changing project requirements.

With a tool like Neptune, you can change things and it just works
Patryk Miziuła Senior Data Scientist at

Getting started

Integrating Neptune into the project’s codebase went smoothly due to the familiarity of Patryk and his team with Neptune’s API. As he explains:

avatar lazyload
Adding Neptune to our code was a breeze. The only problem we experienced was that the number of experiments we created was so big that the standard API could not handle it. Creating batch versions of functions for downloading runs table and experiments was a solution.
Patryk Miziuła Senior Data Scientist at

Let’s talk about what they used the platform for and how it helped them achieve their goals.

Logging and saving metadata

After the smooth integration of Neptune into the codebase, it was time to utilize its functionality. The team used Neptune for:

  • Since the number of experiments exceeded 100k which is huge and impossible to monitor and track manually, Neptune’s platform provided a solution to organize and track experiments mounting to 120k in the format of a leaderboard table. One workaround they had to do was to view experiments in batches as the platform wasn’t able to list >100k experiments at once.

    Patryk’s team utilized Neptune+Optuna to optimize and monitor hyperparameters of those 120k experiments. They particularly liked the seamless integration of Optuna with Neptune.

    “Neptune turned out to be working well with Optuna. We were running 100 Optuna tries per model, the optimal hyperparameters found and the search history were stored in Neptune as easy-to-access interactive charts. In short: we liked it.” – Patryk Miziuła, Senior Data Scientist at

  • They logged pickled models from the experiment run directly to experiment metadata, resulting in easy access. They also logged CSV containing feature sets from Optuna.

  • A large part of their workflow involved running and comparing experiments with plots and graphs, which they chose to do on Neptune’s dashboard due to the plots’ modularity and interactiveness.

    Neptune is aesthetic. Therefore we could simply use the visualization it was generating in our reports.

    We trained more than 120 000 models in total, for more than 7000 subproblems identified by various combinations of features. Due to Neptune, we were able to filter experiments for given subproblems and compare them to find the best one. Also, we stored a lot of metadata, visualizations of hyperparameters’ tuning, predictions, pickled models, etc. In short, we were saving everything we needed in Neptune.” – Patryk Miziuła, Senior Data Scientist at

When a Machine learning project scales, it requires constant nurturing and monitoring just like a human baby. To avoid running into chaos when it scales, you need a tool that can organize and track this stuff for you. 

avatar lazyload
At a certain stage of machine learning maturity the need for a tool like this one rises naturally. And then Neptune is a solid choice because of low entry threshold, many useful features, and good documentation and support.
Patryk Miziuła Senior Data Scientist at


Neptune was the primary choice for them owing to the familiarity of the team with it. Choosing Neptune as part of their MLOps let them:

  • Store model metadata without worrying about synchronization issues with the particular experiment.
  • Save weeks trying to do the same thing with directories and sheets.
  • Run 120k+ experiments without worrying about storage deficits and disk failures.
  • Compare multiple promotions results with different filters to get the best results.
avatar lazyload
Thanks to Neptune, we were able to run our scripts on 5 bare metals simultaneously and store results without worrying about any potential synchronization problems. This lets us work efficiently.
Patryk Miziuła Senior Data Scientist at

And for the team and project like this one you just have to have an experiment tracking tool. As Patryk explains:

avatar lazyload
If there were no experiment tracker then we would end up emulating some. So we would end up writing our own, poor version of a thing like that. It would probably take us a month or more. On the other hand, adding Neptune API to our workflow took us two days or something.
Patryk Miziuła Senior Data Scientist at

Neptune’s inclusion into the MLOps workflow proved productive for Patryk and his team at due to all those reasons. Opting for a tool like Neptune to do all the heavy lifting while you focus on the problem at hand would not only prove productive in terms of the quality of results but also how quickly those results are achieved.

Thanks to Patryk Miziuła for his help in creating this case study!

Forget about manual experiment management as soon as possible. Switch to a specialized managing tool immediately. And definitely consider Neptune for it.
Patryk Miziuła Senior Data Scientist at

Running thousands of experiments? Tired of worrying about storage deficits and disk failures?