ReSpo.Vision uses computer vision and machine learning in sports data analysis to extract 3D data from single-view camera sports broadcast videos. They provide players, scouts, managers, clubs, and federations with an unmatched depth of knowledge. The company concentrates on football (soccer) and targets sports clubs, leagues, bookmakers, and media.
Tracking and managing training pipelines at scale
The ML team works on all aspects of machine learning. They collect raw data, label it, and add new datasets to training and evaluation pipelines. If there’s an improvement during the iteration, they push the new models into their production system.
Kedro pipelines are crucial to their tech stack because they manage the workflow through those pipelines. The team defines the number of pipelines for specific models and algorithms of data processing methods as pipelines. This enables them to easily set parameters for multiple jobs, creating a manageable and reproducible workflow.
The ReSpo.Vision team constantly works on improving their core pipeline models to improve the quality of output data they send to their customers. So, they run a lot of Kedro pipelines.
It worked well at the start. But then they scaled up the number of matches processed, which led to an increase in pipelines ran to build different models. It became extremely hard to manage this workflow and debug pipeline failures at scale.
“One of the biggest challenges was managing the pipelines and the process itself because we had 40 to 50 different pipelines. Depending on the exact use case or what kind of data we’d like to output, we could have different combinations for running them to get different outputs. So basically, the entire system isn’t so simple,” says Wojtek Rosiński, Chief Technology Officer at ReSpo.Vision.
The team noticed problems on many levels:
- Debugging issues with the experiment results was difficult because they ran many pipelines at scale, and most pipeline results depended on the output from upstream pipelines.
- They had a hard time figuring out if their pipelines all finished successfully and how the results of each run compared to the previous runs.
- And then, on the reproducibility side, it was also problematic to know what dataset or parameters were used for each experiment run. Whenever they had great results from one of the experiments, it would take additional effort to figure out what exactly was the combination of parameters they used in Optuna that gave the best results.
They needed a better way to manage their pipeline runs and make the best use of their resources, so the team started to look for a solution to those problems. Above all, they cared about:
- Easy integration with Kedro since that was the framework they used already;
- High readability for logging tons of pipeline metadata in real time;
- And accessible and intuitive comparison feature for pipeline experiment runs.
The team used Neptune in a project they previously worked on and found that it met those requirements.
A nice-to-have were also 25 other Neptune integrations with ML libraries, some of them already used in ReSpo.VIsion’s stack. Łukasz Grad, Chief Data Scientist, adds, “I was surprised by how many frameworks and ML tools Neptune integrates with. For example, we use Pytorch Lightning, and Neptune integrates with this framework, so it was also effortless to add logging outside of Kedro.”
But integrating Neptune into ReSpo.Vision’s workflow was just the first step. Once done, the team started to actually use the tool.
With Neptune, they gained control over their process.
All the information needed to debug any experiment is in one place. “When we run ten parallel pipelines, and some fail, if we run them as detached processes or delegate them, it would become quite challenging to track them on the machine itself. Neptune is very helpful in these situations because it lets me sort through those runs and easily catch runs executed with an error. Then, I can triage the error to see what happened and debug the pipeline.” says Wojtek Rosiński.
The team can easily compare pipelines, assess the outputs’ performance and quality and confidently decide which models are the best. As Wojtek adds, “When we use Neptune with Kedro, we can easily track the progress of pipelines being run on many machines because often we run many pipelines concurrently, so comfortably tracking each of them becomes almost impossible. With Neptune, we can also easily run several pipelines using different parameters and then compare the results via UI.”
And when it’s not easy to see what a good result looks like during an inference process (since they have a couple of pipelines that depend on upstream pipelines to give a valuable output), they leverage the summary statistics provided by Neptune.
Because it’s tough to look at every football match and its outputs (bounding boxes, etc.) and tell whether they’re good or bad, the team uses Neptune to compute some statistics or aggregate the runs and know what pipelines they should run based on the output of previous runs and if there were issues with the last run.
All of that has a huge impact on their business. As Wojtek, the CTO, says, “If we can choose the best-performing model, then we can save time because we would need fewer integrations to ensure high data quality. Customers are much happier because they receive higher quality data, enabling them to perform more detailed match analytics.”
Reporting the quality of pipeline results to clients and non-technical stakeholders
Speaking about customers… The business of ReSpo.Vision is directly affected when valuable outputs (processed data) for analytics are sent to client applications and downstream pipelines. And when the process scaled, communicating pipeline results to clients and other stakeholders who are not technical was a challenge on its own.
As Łukasz Grad says, the team wanted “a friendly method for even a non-technical person to look at a couple of plots, scores, or similar, and decide if we wish to send the processed data (to the client), or maybe someone else with more knowledge should investigate it.”
For this, they leveraged Neptune’s intuitive UI. They tailor runs table to the team’s needs by adding some custom columns and saving views to easily see the insights from pipeline runs through charts and other visualizations. But what proved to be the most valuable, in this case, were the dashboards.
They were able to quickly identify what piqued their interest or drew their attention, thanks to the visualizations. Both technical and non-technical users found the reports and dashboards to be interactive and intuitive. “I like those dashboards because we need several metrics, so you code the dashboard once, have those styles, and easily see it on one screen. Then, any other person can view the same thing, so that’s pretty nice,” adds Łukasz.
Monitoring compute usage for their pipelines
Aside from the scale of the pipelines the team runs, another aspect of their use case that stood out was the sheer complexity of the pipelines they ran and the amount of compute resources they consumed.
Some model pipelines use datasets with hundreds of thousands of images to train. Because most models will also need high-resolution datasets, the amount of computation required to train them is massive. Most of the time, they run Kedro pipelines in the cloud. They usually set up big machines with, say, 100 GPUs, and then divide pipeline tasks among the GPUs.
Making this process optimized is the key to minimizing cost. They monitor the parameters in each workflow stage between data preprocessing, prediction, and data post-processing. They can define, for example, the number of workers which will handle each of those stages to optimize the performance as much as possible and ensure that the GPUs have the highest possible throughput.
Neptune’s compute monitoring capability gives the team the information they needed to make the most of the compute used by their pipelines. “For some of the pipelines, Neptune was helpful for us to see the utilization of the GPUs. The utilization graphs in the dashboard are a perfect proxy for finding some bottlenecks in the performance, especially if we are running many pipelines of those (football) matches,” says Wojtek Rosiński, CTO.
By logging their functional (model evaluation) and operational (system and resource usage) metrics, the team can keep track of the results and the GPU consumption rate for each pipeline in real-time. So, they can use what they’ve learned to improve their experiments and make sure that the jobs that are running use all the GPUs that are available.
“The major point was that during the machine learning experiment, you want the GPU to be at 100% throughout the whole experiment, and it’s not so obvious when you look at it at one point in time. So that Neptune feature was very nice because the compute utilization is tracked by default, so that’s very handy,” adds Łukasz Grad, Chief Data Scientist.
Thanks to Wojtek Rosiński and Łukasz Grad for working with us to create this case study!