Continuum Industries is a company in the infrastructure industry that wants to automate and optimize the design of linear infrastructure assets like water pipelines, overhead transmission lines, subsea power lines, or telecommunication cables.
Its core product Optioneer lets customers input the engineering design assumptions and the geospatial data and uses evolutionary optimization algorithms to find possible solutions to connect point A to B given the constraints.
As Chief Scientist Andreas Malekos, who works on the Optioneer AI-powered engine, explains:
But creating and operating the Optioneer engine is more challenging than it seems:
- The objective function does not represent reality
- There are a lot of assumptions that civil engineers don’t know in advance
- Different customers feed it completely different problems, and the algorithm needs to be robust enough to handle those
Instead of building the perfect solution, it’s better to present them with a list of interesting design options so that they can make informed decisions.
The engine team leverages a diverse skillset from mechanical engineering, electrical engineering, computational physics, applied mathematics, and software engineering to pull this off.
A side effect of building a successful software product, whether it uses AI or not, is that people rely on it working. And when people rely on your optimization engine with million-dollar infrastructure design decisions, you need to have a robust quality assurance (QA) in place.
As Andreas pointed out, they have to be able to say that the solutions they return to the users are:
- Good, meaning that it is a result that a civil engineer can look at and agree with
- Correct, meaning that all the different engineering quantities that are calculated and returned to the end-user are as accurate as possible
On top of that, the team is constantly working on improving the optimization engine. But to do that, you have to make sure that the changes:
- Don’t break the algorithm in some way or another
- They actually improve the results not just on one infrastructure problem but across the board
Basically, you need to set up a proper validation and testing, but the nature of the problem the team is trying to solve presents additional challenges:
- You cannot automatically tell whether an algorithm output is correct or not. It is not like in ML where you have labeled data to compute accuracy or recall on your evaluation set.
- You need a set of example problems that is representative of the kind of problem that the algorithm will be asked to solve in production. Furthermore, these problems need to be versioned so that repeatability is as easily achievable as possible.
Initially, the team has developed a relatively simple and completely custom solution to those problems:
- They implemented a database of “baseline problems”
- The algorithm would run on these problems, and quality metrics would be recorded and written to the database
- A developer could then make some changes to the algorithm, run the code against the “baseline problems”, and compare the metrics generated with the database
- They created some visualization tools that worked by downloading all the metrics for a run
This proved to be an extremely clunky system for the following reasons:
- The metrics stored in the database would go out of date as soon as someone made a change to the algorithm, which meant they had to run an “update” job very often.
- This “update job” was not unit tested properly, so it often broke. This meant that every time a developer tried to update the baseline metric, they would also have to fix the system itself. This would turn into a tedious and painful process.
- The system was pretty complex, which turned it into a “product within a product” that they did not have time to maintain or fix when it broke.
According to Andreas, it took them a while to realize that even though they do not use ML in the product, they face many of the same challenges that ML-in-production faces. That’s when they decided to properly investigate the MLOps solutions that were already out there and see which one could fit their use case best.
As Andreas explains, with experience from trying to build a similar solution themselves, they knew that:
- they wanted a tool that could easily track and visualize different types of data
- they could track both local and cloud runs in the same way
- they wouldn’t need to self-host or maintain the solution
After reading many blogs comparing different experiment trackers and then spending most of their evaluation time going through the documentation of each of the tools, they decided to go with Neptune.
The Optioneer engine team chose Neptune because:
- 1 Getting started is really easy
- 2 Comparing, monitoring, and debugging works great
- 3 They have total flexibility in the metadata structure
- 4 They love the support
- 5 It is easy to access Neptune from anywhere including CI/CD pipelines
As the team shared with us, Neptune improved their entire workflow.
Andreas explained that, when working on optimization engine improvements, they start with one of the test problems, run the modified version of the algorithm, and have all parameters and results tracked in Neptune. It lets the team quickly look back at what they have tried until now and plan the next steps relatively easily.
In addition to having Neptune in the experimentation phase, it also sits at the core of their version of the production MLOps pipeline, executed through GitHub actions. To assure model quality with proper CI/CD jobs they:
- Deploy a bunch of cloud instances on AWS ec2
- In each instance, clone the repository and install the requirements
- Run one of many test problems
- For each running instance of a test problem, they collect metrics and write them all to the same Neptune run
- Calculate aggregate metrics across all tests.
- Compare these aggregate metrics to a previous point in time and decide whether the quality of the algorithm improved at statistical significance
With Neptune, the Optioneer team can:
- Easily keep track of and share the results of our experiments,
- Monitor production runs, track down, and reproduce errors when something goes wrong much faster than before
- Have much more confidence in the results they generate and in how the new versions of Optioneer engine were built
- Understand the performance of their algorithm at any given time with all the engine-related metadata recorded to Neptune through their weekly Quality Assurance CI/CD pipelines
Before Neptune, getting all that functionality required an order of magnitude more time.
Now, they have more trust in their algorithm and more time to work on the core features rather than tedious and manual updates.
Thanks to the whole team behind the Optioneer engine (Andreas Malekos, Miles Gould, Daniel Toth, Ivan Chan) for their help in creating this case study!