ailslab is a small bioinformatics research group on a mission to make humanity healthier. That mission is to build models which might someday save your heart from illness. It boils down to applying machine learning to predict cardiovascular disease development based on clinical, imaging, and genetics data.
Research is so intense that it required a custom infrastructure (which took about a year to build) to extract features from different types of data:
- Electronic health records (EHR)
- Time-to-event (regression methods)
- Image (convolutional neural networks)
- Structured data and ECG.
With a fusion of these features, precise machine learning models can solve complex issues. In this case, it’s risk stratification for primary cardiovascular prevention. Essentially, it’s about predicting which patients are most likely to get cardiovascular disease.
- 1 Define the task to be solved (e.g., build a risk model of cardiovascular disease)
- 2 Define the task objective (e.g., define expected experiment results)
- 3 Prepare the dataset
- 4 Work on the dataset in interactive mode with Jupyter notebooks; quick experimenting, figuring out the best features for both the task and the dataset, coding in R or Python
- 5 Once the project scales up, use a workflow management system like Snakemake or Prefect to transform the work into a manageable pipeline and make it reproducible. Without that, it would be costly to reproduce the workflow or compare different models
- 6 Create machine learning models using PyTorch Lightning integrated with Neptune, where some initial evaluations are applied. Log experiment data
- 7 Finally, evaluate model performance and inspect the effect of using different sets of features and hyperparameters
5 problems of scaling up Machine Learning research
ailslab started as a small group of developers. Collaboration became more challenging, and new problems began to appear along with the inflow of new team members. They noticed those problems quickly, and Neptune helped solve them.
Why ailslab chose Neptune
In short – because it saves time. If you’re a researcher, you know that managing multiple experiments is challenging. With such complex objectives and workflows, the ailslab team has to do a lot of tedious work to stay on the right track.
Neptune saves time by removing a lot of that tedious work, and time is a luxury that the ailslab team doesn’t have a lot of!
- Compared to using a custom logger, Neptune takes care of everything, and the team has more time to do research tasks.
- ailslab researchers now use one platform with their results presented in the same way. It leaves less room for mistakes.
- Comparing and managing experiments takes less time. Researchers can go back and forth between the history of experiments, make changes, and see how the changes affect the results
- Building complex models (like deep learning models for images) and exploring how they work is a bit easier. Neptune stores data about the environment setup, the underlying code, and the model architecture.
- Neptune helps organize things. In ailslab, they add experiment URLs from Neptune to cards in their Kanban board in Notion. This easy access to experiment information helps keep everything organized. The whole team has a better idea about things like the effect of hyperparameters on the model.
All in all, machine learning is hard. Building ML models to detect heart disease before it happens adds another very thick layer of difficulty. We’re glad that Neptune takes away the tedious parts of ailslab projects, and we wish them all the best in their research.
To learn more about ailslab, check out their full story.
Thanks to Jakob Steinfeldt and Thore Bürgel for their help in creating this case study!