Case Study

Continuum Industries

“Gone are the days of writing stuff down on google docs and trying to remember which run was executed with which parameters and for what reasons. Having everything in Neptune allows us to focus on the results and better algorithms.“

Andreas Malekos

Chief Scientist @Continuum Industries

Logo_Continuum Industries

Continuum Industries is a company in the infrastructure industry that wants to automate and optimize the design of linear infrastructure assets like water pipelines, overhead transmission lines, subsea power lines, or telecommunication cables.  

Its core product Optioneer lets customers input the engineering design assumptions and the geospatial data and uses evolutionary optimization algorithms to find possible solutions to connect point A to B given the constraints. 

As Chief Scientist Andreas Malekos, who works on the Optioneer AI-powered engine, explains:

“Building something like a power line is a huge project, so you have to get the design right before you start. The more reasonable designs you see, the better decision you can make. Optioneer can get you design assets in minutes at a fraction of the cost of traditional design methods.”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

But creating and operating the Optioneer engine is more challenging than it seems:

  • The objective function does not represent reality
  • There are a lot of assumptions that civil engineers don’t know in advance
  • Different customers feed it completely different problems, and the algorithm needs to be robust enough to handle those

Instead of building the perfect solution, it’s better to present them with a list of interesting design options so that they can make informed decisions.

The engine team leverages a diverse skillset from mechanical engineering, electrical engineering, computational physics, applied mathematics, and software engineering to pull this off.

Problem

A side effect of building a successful software product, whether it uses AI or not, is that people rely on it working. And when people rely on your optimization engine with million-dollar infrastructure design decisions, you need to have a robust quality assurance (QA) in place.

As Andreas pointed out, they have to be able to say that the solutions they return to the users are:

  • Good, meaning that it is a result that a civil engineer can look at and agree with
  • Correct, meaning that all the different engineering quantities that are calculated and returned to the end-user are as accurate as possible

On top of that, the team is constantly working on improving the optimization engine. But to do that, you have to make sure that the changes:

  • Don’t break the algorithm in some way or another
  • They actually improve the results not just on one infrastructure problem but across the board

Basically, you need to set up a proper validation and testing, but the nature of the problem the team is trying to solve presents additional challenges:

  • You cannot automatically tell whether an algorithm output is correct or not. It is not like in ML where you have labeled data to compute accuracy or recall on your evaluation set. 
  • You need a set of example problems that is representative of the kind of problem that the algorithm will be asked to solve in production. Furthermore, these problems need to be versioned so that repeatability is as easily achievable as possible.

“How do we do all of the above in a seamless and automated way, ideally as part of our CI pipeline?”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

Initially, the team has developed a relatively simple and completely custom solution to those problems:

  • They implemented a database of “baseline problems”
  • The algorithm would run on these problems, and quality metrics would be recorded and written to the database
  • A developer could then make some changes to the algorithm, run the code against the “baseline problems”, and compare the metrics generated with the database
  • They created some visualization tools that worked by downloading all the metrics for a run

This proved to be an extremely clunky system for the following reasons:

  • The metrics stored in the database would go out of date as soon as someone made a change to the algorithm, which meant they had to run an “update” job very often.
  • This “update job” was not unit tested properly, so it often broke. This meant that every time a developer tried to update the baseline metric, they would also have to fix the system itself. This would turn into a tedious and painful process.
  • The system was pretty complex, which turned it into a “product within a product” that they did not have time to maintain or fix when it broke.

“This would turn into a tedious and painful process. The system was fairly complex, which kind of turned it into a “product within a product” that we did not have time to maintain or fix when it broke.”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

According to Andreas, it took them a while to realize that even though they do not use ML in the product, they face many of the same challenges that ML-in-production faces. That’s when they decided to properly investigate the MLOps solutions that were already out there and see which one could fit their use case best.

Solution

As Andreas explains, with experience from trying to build a similar solution themselves, they knew that:

  • they wanted a tool that could easily track and visualize different types of data 
  • they could track both local and cloud runs in the same way
  • they wouldn’t need to self-host or maintain the solution 

“We’re still a fairly small team (10 devs or so), so we’d rather avoid having to manage this system ourselves, so we can focus on building our product and improving the AI. We had to do that with our previous system, and it was a huge time sink.”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

After reading many blogs comparing different experiment trackers and then spending most of their evaluation time going through the documentation of each of the tools, they decided to go with Neptune.

The Optioneer engine team chose Neptune because: 

  • 1Getting started is really easy
  • 2Comparing, monitoring, and debugging works great
  • 3They have total flexibility in the metadata structure
  • 4They love the support
  • 5It is easy to access Neptune from anywhere including CI/CD pipelines
Getting started
  • Getting started
  • Comparing, monitoring, and debugging
  • Flexible metadata structure
  • Support

From zero to Hello World was very quick; adding Neptune calls to the existing metrics-logging classes took a week or so, but that was primarily because of the complexities of the codebase. For example, they had trouble logging multidimensional NumPy arrays, which was eventually solved by uploading them as files.

As Andreas explains:

“We also liked the conceptual simplicity of Neptune: unlike some of its competitors, it’s just a metadata store and doesn’t try to solve a million different problems, so it was easy to add it to our existing code.”

After integrating Neptune into their codebase, it became much easier to track experiment runs, compare plots of different metrics across runs, but also monitor and debug production runs. 

“The ability to compare runs on the same graph was the killer feature, and being able to monitor production runs was an unexpected win that has proved invaluable.” –  Andreas Malekos, Chief Scientist Continuum Industries 

As Andreas told us, they record quality metrics (objective value, constraint violation), the final value of the design variables, and all the input parameters for production runs. 

Keeping track of all that helps them debug production failures easily: 

  • If the optimization crashes, recording the input parameters and the code version makes it fairly easy to replicate the error and find out why things crashed.
  • If the result looks very bad, recording the objective and constraint violation and the final design variables allows the team to re-create the final result locally. They can then inspect it and figure out why their algorithm thinks this is a good result and why it was preferred.

“We are huge fans of the way data is structured in Neptune runs. The fact that we can basically design our own file structure effortlessly gives us enormous flexibility.” –  Andreas Malekos, Chief Scientist Continuum Industries 

It makes it easy to use Neptune for pretty much anything: 

  • When researching algorithm improvements, they can record results in their custom structure and compare them easily
  • Monitoring and recording production runs in a way that is convenient for debugging
  • Using Neptune as part of their CI pipeline by setting up many batch jobs, all writing metric data to the same Neptune run, and then comparing things to the current version of master to make sure that they’ve not broken anything. 

For example, in the case of the engine tests, where multiple jobs write to a single run, the structure would look like this:

  • metrics: top-level folder where all the metrics are stored
    • {TEST_CASE_NAME} 
      • {INDEX}: each case is run multiple times with different seeds
        • {STAGE_NAME}: there are multiple stages during the optimization
          • metric0
          • metric1
          • metric2

Continuum Industries metadata structure

“We had some issues with using the new Neptune API at first, and we’ve received an incredible amount of support from the team since then. Talk to the Neptune team if you run into any issues as they are incredibly helpful.” –  Andreas Malekos, Chief Scientist Continuum Industries 

As Andreas shared with us, as you adopt Neptune (or any tool for that matter), there may be some bumps along the way. What is unique about Neptune is that you can really count on the team to help you through it, share your feedback and improvements ideas, and see them implemented. You get to be a part of the journey. 

It was important for the Optioneer engine team to use Neptune locally and in the cloud environment and log metadata for offline jobs or set it up for debugging with various connection modes.

They needed Neptune to play nicely with their CI/CD pipelines too. 

“When someone submits a Pull Request it triggers a CI/CD pipelines via GitHub Actions. Each step evaluates the newest version of the algorithm on a baseline problem in a separate process.
We were afraid that organising all of those results for each CI/CD pipeline execution would be a nightmare but thanks to Neptune custom run ID functionality we can log all of the evaluations to the same run and keep it nice and clean.“ –  Andreas Malekos, Chief Scientist Continuum Industries

Results

As the team shared with us, Neptune improved their entire workflow.

“We benefited a lot from using Neptune for experimentation and then got even more value out of it when we connected it to production!”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

Andreas explained that, when working on optimization engine improvements, they start with one of the test problems, run the modified version of the algorithm, and have all parameters and results tracked in Neptune. It lets the team quickly look back at what they have tried until now and plan the next steps relatively easily. 

“Gone are the days of writing stuff down on google docs and trying to remember which run was executed with which parameters and for what reasons. Having everything in Neptune allows us to focus on the results and better algorithms.“

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries

In addition to having Neptune in the experimentation phase, it also sits at the core of their version of the production MLOps pipeline, executed through GitHub actions. To assure model quality with proper CI/CD jobs they:

  • Deploy a bunch of cloud instances on AWS ec2
  • In each instance, clone the repository and install the requirements
  • Run one of many test problems
  • For each running instance of a test problem, they collect metrics and write them all to the same Neptune run
  • Calculate aggregate metrics across all tests.
  • Compare these aggregate metrics to a previous point in time and decide whether the quality of the algorithm improved at statistical significance
Continuum Industries pipeline

Continuum Industries MLOps pipeline | Click to enlarge the image

With Neptune, the Optioneer team can:

  • Easily keep track of and share the results of our experiments,
  • Monitor production runs, track down, and reproduce errors when something goes wrong much faster than before
  • Have much more confidence in the results they generate and in how the new versions of Optioneer engine were built
  • Understand the performance of their algorithm at any given time with all the engine-related metadata recorded to Neptune through their weekly Quality Assurance CI/CD pipelines

Before Neptune, getting all that functionality required an order of magnitude more time.

Now, they have more trust in their algorithm and more time to work on the core features rather than tedious and manual updates.

“I’d recommend Neptune to any ML team in the AI industry. The tool is great and the company behind it is even better.”

Andreas Malekos

Andreas Malekos

Chief Scientist @Continuum Industries


Thanks to the whole team behind the Optioneer engine for their help in creating this case study!

Andreas Malekos
Andreas Malekos
Miles Gould
Daniel Toth
Ivan Chan

Want your team not to lose time on manual tasks and focus on what’s important?

Get started with Neptune
Logo_Continuum Industries
  • Industry Infrastructure
  • Location Edinburgh, United Kingdom
  • Team size 16
  • Frameworks DVC, CML, Hypothesis, Platypus, Ray, Github Actions, scipy, numpy, pandas, matplotlib
  • Neptune use cases monitoring CI/CD pipelines, experiment tracking, debugging, collaboration