📣 BIG NEWS: Neptune is joining OpenAI! → Read the message from our CEO 📣

Case Study

How Neptune Helps Artera Bring AI Solutions to Market Faster

For me, Neptune is really the central place for results. If Neptune is down, I don't know how my sweep is doing.
Hans Pinckaers
ML Scientist at Artera
Before
    Difficulty managing large-scale hyperparameter sweeps and training runs
    Limited real-time insights slowing down model optimization
After
    Can monitor training in real time and spot any issues
    Can control the sweeps process with Neptune's UI capabilities
    Delivers AI models quickly and within deadlines

Artera is a leading precision medicine company developing AI tests to personalize cancer therapy. After diagnosis, cancer patients and their clinicians work together to evaluate which treatment will result in the best outcome. Often, the right choice is not obvious. To this end, Artera’s AI tests assess a patient’s level of risk and determine which treatment will have the best outcome for the patient. 

Artera has initially commercialized a test for Prostate Cancer, which is now considered standard-of-care for Prostate Cancer. That being said, the company is rapidly expanding into new areas of cancer to provide similar assistance for more patients. AI/ML is a core part of Artera’s test. In particular, Artera primarily relies on computer vision models that interpret a patient’s histopathology slides to assess the patient’s risk.

Source: artera.ai

Developing new AI tests involves numerous training runs and extensive hyperparameter optimization sweeps. While model training forms the foundation of product development, its success depends on collaboration with teams beyond AI—validating the models, defining acceptance criteria, ensuring their usefulness for clinicians, releasing the product, and continuing to monitor its safety and efficacy.

The challenge

For Artera’s AI team, success is measured by two critical factors:

  • Whether their models validate on external datasets;
  • Whether they can deliver a model quickly since they have tight deadlines often coupled with external events such as conference deadlines or regulatory filings. 

From the beginning, Artera sought tools that could support these objectives. Experiment tracking was one of the priorities.

The team explored various solutions before selecting Neptune. For Hans Pinckaers, an ML Scientist at Artera, Neptune stood out already when he did his PhD. The tool seemed faster and snappier than the alternatives. 

On the other hand, Wouter Zwerink, another ML Scientist in the team, used TensorBoard during his thesis and considered it very limited. For Hans, TensorBoard was just not easy to use. Even basic tasks were not intuitive and felt hacky.

Artera needed a tool that could:

  • Monitor training in real time and track all metrics for reproducibility and validation.
  • Handle large-scale hyperparameter sweeps efficiently, allowing for quick analysis and comparisons.
  • Just be intuitive and easy to use for all team members.

After a thorough evaluation, Artera chose Neptune in early 2022.

avatar lazyload
quote
When working with projects that had thousands of runs, loading the interface and sorting through data was super slow in Weights & Biases. Neptune is better at handling large-scale. We’re happy with this choice.
Hans Pinckaers ML Scientist at Artera

Real-time insights for optimized training

The initial phase of model development at Artera involves manually training a few models. During this stage, the team monitors real-time progress using Neptune, focusing on loss and accuracy plots to detect overfitting or underfitting. Seeing these results in Neptune in real-time allows them to react immediately to any divergences. 

Neptune’s dashboards allow the team to group training and validation loss curves for easy comparison, providing a comprehensive view of model behavior. Grouping is also handy when they compare different architectures and backbones for features as it gives them a great overview of performance across all configurations in one place.

For the self-supervised learning runs, which involve larger datasets and several hours of training, hardware monitoring is also extremely useful. With Neptune, the team can monitor the hardware efficiency and spot problems right away. 

avatar lazyload
quote
Some time ago, Neptune changed its pricing model, removing limits on monitoring hours. This allows us to do more logging and use more of the hardware monitoring as well. It’s really nice, because if the cluster crashes, Neptune serves as a backup for error codes or other insights.
Hans Pinckaers ML Scientist at Artera

Large hyperparameter sweeps made manageable

Once manual training shows promising results, the team initiates hyperparameter optimization sweeps, which often involve thousands of trials. Artera combines Neptune with Optuna to streamline this process.

All of the runs, together with metrics and hyperparameters, are recorded in Neptune. While when training a few models it’s rather straightforward to analyze the results, for sweeps with thousands of experiments, it’s much more challenging without the right tools. 

Luckily, Neptune makes it easy. For large sweeps, Artera researchers rely mostly on the runs table and its filtering and grouping functionalities. For example, they name each hyperparameter run and group by those names. Then, they filter to look at specific runs or group by sweep to rank runs by their metrics. This gives them a clear view of both sweep-level and run-level performance and enables them to confidently pick the best-performing runs.

avatar lazyload
quote
For me, Neptune is really the central place for results. If Neptune is down, I don’t know how my sweep is doing.
Hans Pinckaers ML Scientist at Artera

The results

Neptune proved to be stable and reliable enough for the team, so they fully depend on it when developing models. The key outcomes include:

  • The ability to monitor training in real-time and track hardware usage ensures quick identification of promising results or issues, reducing downtime and speeding up development cycles.
  • Grouping, filtering, and ranking thousands of experiments in Neptune allows for clear insights and confident decision-making during hyperparameter optimization.
  • Overall, Neptune helps the team stay on track to deliver models on time and meet their quarterly deadlines.
avatar lazyload
quote
We have all the metrics in our shared file storage as a backup, but we don’t really have a nice way to access them, to sort them, etc. We don’t have a setup for it because Neptune has been stable enough for us not to need it.
Wouter Zwerink ML Scientist at Artera

Thanks to Hans Pinckaers, Wouter Zwerink, and Nathan Silberman for helping create this case study!

avatar
quote
When working with projects that had thousands of runs, loading the interface and sorting through data was super slow in Weights & Biases. Neptune is better at handling large-scale.
Hans Pinckaers ML Scientist at Artera

Looking for an experiment tracker that can handle the large scale of your model training?