Case Study

How Elevatus Uses Neptune to Check Experiment Results in 1 Minute

With Neptune, I have a mature observability layer to access and gain all the information. I can check any model's performance very quickly. It would take me around 1 minute to figure out this information.

Yanal Kashou

Chief Innovation Officer at Elevatus

Before

No observability layer in place

Hours wasted on redoing work to find model evidence

After

Reduced model metadata retrieval time to under one minute

Can run new training iteration in 15 minutes

experiments

models trained

Elevatus is a talent management platform that covers the entire pre-HR cycle of recruitment, hiring, and onboarding. Through their Innovation department, they create intelligent systems that seamlessly cater to their clients’ needs by reducing the complexity of the hiring processes.

Elevatus has several products, but this case study will focus on the Video Assessment product that enables the applicant to apply through a one-way interview.

The challenge

The team began developing products pre-MLOps with a few models running in production.

”I remember training the first models without having an observability layer for the language models we built. Whenever colleagues asked for evidence, I would have to reanalyze the data and reimplement the workflow. It was not the correct way to do so.” says Yanal Kashou, Chief Innovation Officer at Elevatus.

They realized they needed a mature observability layer as a precursor to training before moving to running large-scale training jobs and involving stakeholders in auditing the AI systems.

Getting observability and traceability when training large models

The team started to look for a model tracking solution with a few criteria in mind. As Yanal says, “I wanted to track our models’ performance, go back over time, and say this model did XYZ, especially from a history-based perspective.”

On top of that, they extensively use PyTorch Lightning and Optuna for all their trials, so the solution needed to connect well with these tools.

They chose Neptune, and it’s now integrated into their whole workflow.

The Elevatus team uses Kubernetes for cloud infrastructure and runs configurations from a YAML manifest file. The Neptune configuration is hardcoded into the file. It monitors all the functional and operational processes after they execute the training script. The team says that with Neptune in place, they can evaluate models better, optimize compute utilization, and iterate quickly across experiments.

“With Neptune and Optuna in place, everything happens automatically. But let’s say we want to change something. Having Neptune completely integrated into our workflow, it takes us around 15 minutes to do something different and iterate on a completely different level.” mentions Yanal.

Without Neptune, the team wouldn’t have a clear view of the direction they should be going in. That’s where they see the value of the observability layer. It’s helping them to understand where they should look, where they should go, and how they should configure their training.

Once ready, the team pushes trained models to a designated folder in Google Cloud Storage (GCS), where they adopted naming structures that aligned with the experiments in Neptune (for this part, they plan to adopt Neptune’s model versioning functionality soon).

With Neptune, I have a mature observability layer to access and gain all the information. I can check any model’s performance very quickly. It would take me around a minute to figure out this information. I don’t have to go deeper and waste a lot of time. I have the results right in front of me. The time we have gained back played a significant part.

Yanal Kashou Chief Innovation Officer at Elevatus

Anyone on the team can do the same because Neptune gives them this shared, centralized view. “When they tell me there’s an issue with our training, we need help with a bug, or we don’t know why something is happening, our immediate fallback is to look at Neptune.”

And, what’s crucial when working with large models, the team could scale the experiment monitoring component as dataset sizes grew to terabytes without computational bottlenecks across each node. “With Neptune, there was no change in our ability to observe and monitor our trials from small to large datasets. We were able to train successfully on tera-scale data with minimal effort.”—adds Yanal.

Customizable user interface for visualizing training

One of the things the team at Elevatus struggled with prior to using Neptune was visualizing their training. They evaluated multiple tools.

As Yanal says, “MLflow was great—simple and to the point. But it wasn’t giving me the observability dashboards that I wanted. We had to build everything from scratch. The deployment was not easy to manage. I liked it as a framework. It would be great for many use cases, just not ours.”

They also checked TensorBoard but encountered several problems with it. Unfortunately, TensorBoard works best in notebook environments, while the team usually ran training jobs using scripts instead of notebooks. On top of that, it lacked the flexibility and ease of use that Neptune offers.

As Yanal says, “In TensorBoard, by default, there are many metrics that I do not want to monitor. I wanted to design my own metrics to track, and this was where Neptune came into play.”

With Neptune, they can build customized views and dashboards to monitor exactly the metrics they want.

Yanal says he sees Neptune as an SQL-based system that empowered him to:

generate charts,
perform advanced analytics,
conduct in-depth investigations,
establish a thorough training history,
and seamlessly customize dashboards to suit the specific requirements of the project at hand.

With this flexibility, the team at Elevatus achieves significantly better results.

“We attained an average MSE of ~ 0.043 and an average MAE of ~ 0.16 across ~ 178 trials (4 modalities of predictions, including text, tone of voice, eye movement, and facial expressions) to predict personality traits for a single snapshot with 60 seconds of video and audio. The precision (and accuracy) of our inference increased with video length.

The reason we can do that is because the monitoring layer is available. At the end of the day, we can go to the study, look at every parameter, and make detailed comparisons to know which trial performed better.”- says Yanal.