How Neptune Helps Artera Bring AI Solutions to Market Faster
Artera is a leading precision medicine company developing AI tests to personalize cancer therapy. After diagnosis, cancer patients and their clinicians work together to evaluate which treatment will result in the best outcome. Often, the right choice is not obvious. To this end, Artera’s AI tests assess a patient’s level of risk and determine which treatment will have the best outcome for the patient.
Artera has initially commercialized a test for Prostate Cancer, which is now considered standard-of-care for Prostate Cancer. That being said, the company is rapidly expanding into new areas of cancer to provide similar assistance for more patients. AI/ML is a core part of Artera’s test. In particular, Artera primarily relies on computer vision models that interpret a patient’s histopathology slides to assess the patient’s risk.

Developing new AI tests involves numerous training runs and extensive hyperparameter optimization sweeps. While model training forms the foundation of product development, its success depends on collaboration with teams beyond AI—validating the models, defining acceptance criteria, ensuring their usefulness for clinicians, releasing the product, and continuing to monitor its safety and efficacy.
The challenge
For Artera’s AI team, success is measured by two critical factors:
- Whether their models validate on external datasets;
- Whether they can deliver a model quickly since they have tight deadlines often coupled with external events such as conference deadlines or regulatory filings.
From the beginning, Artera sought tools that could support these objectives. Experiment tracking was one of the priorities.
The team explored various solutions before selecting Neptune. For Hans Pinckaers, an ML Scientist at Artera, Neptune stood out already when he did his PhD. The tool seemed faster and snappier than the alternatives.
On the other hand, Wouter Zwerink, another ML Scientist in the team, used TensorBoard during his thesis and considered it very limited. For Hans, TensorBoard was just not easy to use. Even basic tasks were not intuitive and felt hacky.
Artera needed a tool that could:
- Monitor training in real time and track all metrics for reproducibility and validation.
- Handle large-scale hyperparameter sweeps efficiently, allowing for quick analysis and comparisons.
- Just be intuitive and easy to use for all team members.
After a thorough evaluation, Artera chose Neptune in early 2022.
Real-time insights for optimized training
The initial phase of model development at Artera involves manually training a few models. During this stage, the team monitors real-time progress using Neptune, focusing on loss and accuracy plots to detect overfitting or underfitting. Seeing these results in Neptune in real-time allows them to react immediately to any divergences.
Neptune’s dashboards allow the team to group training and validation loss curves for easy comparison, providing a comprehensive view of model behavior. Grouping is also handy when they compare different architectures and backbones for features as it gives them a great overview of performance across all configurations in one place.
For the self-supervised learning runs, which involve larger datasets and several hours of training, hardware monitoring is also extremely useful. With Neptune, the team can monitor the hardware efficiency and spot problems right away.
Large hyperparameter sweeps made manageable
Once manual training shows promising results, the team initiates hyperparameter optimization sweeps, which often involve thousands of trials. Artera combines Neptune with Optuna to streamline this process.
All of the runs, together with metrics and hyperparameters, are recorded in Neptune. While when training a few models it’s rather straightforward to analyze the results, for sweeps with thousands of experiments, it’s much more challenging without the right tools.
Luckily, Neptune makes it easy. For large sweeps, Artera researchers rely mostly on the runs table and its filtering and grouping functionalities. For example, they name each hyperparameter run and group by those names. Then, they filter to look at specific runs or group by sweep to rank runs by their metrics. This gives them a clear view of both sweep-level and run-level performance and enables them to confidently pick the best-performing runs.
The results
Neptune proved to be stable and reliable enough for the team, so they fully depend on it when developing models. The key outcomes include:
- The ability to monitor training in real-time and track hardware usage ensures quick identification of promising results or issues, reducing downtime and speeding up development cycles.
- Grouping, filtering, and ranking thousands of experiments in Neptune allows for clear insights and confident decision-making during hyperparameter optimization.
- Overall, Neptune helps the team stay on track to deliver models on time and meet their quarterly deadlines.
Thanks to Hans Pinckaers, Wouter Zwerink, and Nathan Silberman for helping create this case study!