We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

Blog » ML Tools » The Best Amazon SageMaker Alternatives [for Experiment Tracking and Model Management]

The Best Amazon SageMaker Alternatives [for Experiment Tracking and Model Management]

Machine learning projects are far more complex than a typical software project. Compared to a software project, which primarily involves writing and testing some code, ML projects follow an iterative process and require intense experimentation. More than a code error, ML projects fail due to concept drift, poorly tuned values of hyperparameters, and faulty model architecture. Tracking machine learning models helps Data Scientists and ML engineers to deal with such issues. There are a number of tools available in the market to help professionals in tracking model experiments and their management. 

In this article:

  • We will take a look at one of the popular tools in this category — AWS SageMaker and its capabilities.
  • We will also discuss a few situations where AWS SageMaker falls short and the alternatives we can look up to.

AWS SageMaker overview

Amazon SageMaker is a fully managed service that provides every developer and data scientist the ability to prepare, build, train, and deploy machine learning models without any hassle. SageMaker takes care of all the complex processes and makes it easier for its users to develop high-quality models. To make models production-ready and create value through machine learning models, data scientists and ML engineers routinely train thousands of different versions of models to find the one with the maximum accuracy.. 

Creating models and finding the best one can be a tedious job when the ML teams have to deal with thousands of jobs, keeping track of different metrics, comparing the results across different experiments, etc. Amazon SageMaker consists of different capabilities under one hood which can help in the end-to-end processing of machine learning services. 

Amazon SageMaker Experiments and Model Monitoring are two capabilities that are integrated with Amazon SageMaker Studio. ML engineers can easily run queries to visualize models, performance metrics, and any concept drift, if there. It also provides Model Registry functionality where users can go and identify the model instances – previous model metadata, training metrics, hyperparameters, etc.

How to manage experiments in Amazon SageMaker

SageMaker experiments automatically keep track of inputs, parameters, configurations and results of each experiment iteration. It is integrated with SageMaker studio and provides visual functionality to view experiments, compare them on key metrics and decide which one is the best performing model. It enables tracking all the artifacts that were used to create the models and users can easily revisit the models to troubleshoot or audit.

AWS SageMaker: experiments
AWS SageMaker: experiments | Source
  • Organize experiments – It offers a structured way for users to organize their model experiments. One experiment can have multiple trials and a trial is a collection of input assets that are required to run an ML model. With SageMaker Experiments, users can group each machine learning iterations and later determine which trial produced the best model.
  • Track experiments – There are two ways to track the experiments with this capability – automatic and manually. It can record and track independent training, batch transformation, all the experiment artifacts including datasets, algorithms, hyperparameters and metrics etc automatically. For manual tracking, SageMaker provides tracking APIs to record and track the ML experiments locally on notebooks.
  • Compare and evaluate experiments – Using SageMaker studio users can compare multiple experiments using metric charts and graphs. These experiment visualizations are updated in real-time as the experiment progresses.
  • SageMaker Autopilot – Autopilot automatically builds, trains and tunes the ML models. This eliminates the heavy lifting and automatically explores the different experiments to find the best models. 

How to monitor your model in Amazon SageMaker

SageMaker Model Monitoring automatically monitors machine learning models in production and can alert ML engineers about any inaccurate predictions and data drifts if they appear. 

AWS SageMaker: model monitoring
AWS SageMaker: model monitoring | Source

The model trained on some data in 2010 is likely to fail when run in 2021 because of the gradual shift in the statistical nature of the input data and can eventually impact the accuracy of the model. It also helps in identifying any potential bias in the machine learning models. Users can use prebuilt model monitoring capabilities or edit them as per their requirements. SageMaker provides below capabilities for model monitoring:

  • Data Quality – Users can create baseline jobs and analyze input datasets. These baseline jobs can compute baseline schema constraints and statistics. Monitoring data quality can help to see if the model has begun to lose accuracy.
  • Model Quality – Model quality can be monitored using performance metrics. Users can compare the predictions of the model with actual ground truth.
  • Bias Drift – With SageMaker users can monitor the models for any bias on a regular basis. Bias can be introduced when the training data is different from the live data. 
  • Feature Attribution Drift – Similar to the bias drift, with feature attribution drift users can look into individual features and compare the ranking of training and live data. 

Amazon SageMaker Model Registry

Amazon SageMaker Model Registry helps to catalog different versions of a model. Users can follow a few steps to register the model version and later deploy the same to production. Here are the following things that can be done using the model registry:

  • Creating a model catalogue to package different versions of a model together.
  • Manage the model version.
  • Associating metadata with the model.
  • Tracking all the models in a group.
AWS SageMaker: model groups and registry
AWS SageMaker: model groups and registry | Source

When is AWS Sagemaker not the best choice?

AWS SageMaker provides its users full control over their machine learning models and maintenance. This helps customers focus on creating value instead of constantly checking on their model. Even though AWS is a really popular choice, it falls flat on a number of things and in a number of circumstances-

  • Flexibility – AWS SageMaker has multiple capabilities that help users to track and compare model experiments, but they all are part of one ecosystem i.e. AWS SageMaker and can’t be used separately. This is where it falls short as it doesn’t provide flexibility for users with different levels of expertise, who are looking to avail only a few functionalities and are only interested in keeping track of experiments and finding which model is the best.
  • Cost – AWS SageMaker is free to use with the AWS free tier service. Though the free tier comes with restrictions, users can use some baseline services and up to a limited capability. If a user crosses the baseline, AWS charges accordingly. Compared to other available solutions, AWS is pretty expensive and the cost can scale up exponentially as users go on consuming more services. AWS SageMaker instances are 40% more expensive than their equivalent AWS EC2 instances.
  • Model Comparison – With AWS SageMaker users can compare multiple ML jobs though it supports a limited number of visuals and data types. It doesn’t provide table comparisons of different experiments and users can’t compare more than 3 experiments at once. Users can’t log custom comparisons using notebooks or code.
  • Forced Workflow – AWS started as a cloud service provider and has added multiple ML capabilities. AWS SageMaker is not just a service but an end-to-end platform for users to create ML pipelines. It is sort of a closed box, the users can work only within AWS capabilities e.g the ML models and related assets get stored on S3 which makes it difficult to share the results with others. 
  • Documentation & Community Support –  SageMaker provides large documentation for setup but it is confusing and it takes time to find the right guide or tutorials. Even though SageMaker has been around for a while now, it is hard to find the solution online on platforms like StackOverflow. Every time you search for an answer online you will end up with AWS pages of tutorials and blogs only and they might not be relevant. The best a user can do is look at the Python SDK code or contact AWS Support. Users can also post their queries on AWS forum but it might take a while to get the answers.

In general, AWS SageMaker is a great tool when your enterprise or teams are already on AWS. There are other advanced tools available in the market which can provide the best experience to their customers with an extensive integration lineup. These tools are also much more simple, portable and allow the users to access SageMaker.

Amazon SageMaker alternatives

1. Neptune.ai

Neptune.ai is an ML metadata store and provides a single place to log, store, display, organize, compare, and query all that model building metadata. This ML metadata includes metrics, hyperparameters, learning curves, training codes, configuration files, console logs, diagnostic charts, model versions, dataset versions, and more.

The recorded ML metadata is used for experiment tracking and model registry. It enables ML engineers and data scientists to monitor the experiments as they are running and keep track of metrics, parameters, etc. Neptune.ai provides a great user-friendly experience and allows its users to search, group, and compare the experiments. Users can easily share the results with their team members.

Advantages of Neptune.ai

  • It offers easy and seamless integration with 25+ different tools
  • Teams can easily collaborate, share reports and insights etc.
  • It has a very intuitive and flexible UI which allows users to visualize and arrange the data as per their choice
  • Neptune.ai stores most of the metadata and its version that can help users to reproduce the models
  • Users have the choice of searching for experiments and data using different filters 

Main features for model tracking and management

Experiment tracking 

With Neptune.ai’s experiment tracking users can log any model object easily in one place and display them for monitoring purposes. Users can run the experiments anywhere – on personal laptops, cloud, notebooks, etc. in various programming languages but still will be able to consolidate the results in one place, using either the hosted version or on-premise. 

Neptune: experiment tracking
Neptune.ai: experiment tracking | Source

While comparing the experiments, the platform will automatically show the differences in a table. Users can also make use of interactive performance charts to get a clear understanding of each experiment. The comparison view can be saved as a dashboard for later or to explore and share the results with other stakeholders.

Neptune: comparing experiments
Neptune.ai: comparing experiments | Source

Furthermore, all the logged objects and comparison views can be extracted to a local machine. Users can go back to the experiments even after months and will be able to access the experiments whenever needed. For more details please check out Neptune.ai’s overview.

Model registry 

Model registry in Neptune.ai helps users to store different versions of machine learning model and model building metadata. This allows users to organize the models in a central model registry. It stores every training version of any model along with datasets, code, and parameters, which can help to reproduce, re-run and deploy the models.

Neptune: model registry
Neptune.ai: model registry | Source

2. Comet

Comet helps users to manage and optimize the entire ML lifecycle from experiment tracking to model production monitoring. It provides easy and fast integration, just by including a few lines of code into the existing code, users can start tracking the experiments and compare the different versions of models, etc. Users can monitor models in real-time and identify if the model is performing as expected throughout all the segments. 

It allows enterprises to visualize the experiments and all the processes. Users can easily consolidate, manage and collaborate on all the reports and even keep the stakeholders informed on performance. 

Advantages over SageMaker

  • Seamless integration with other tools. 
  • Provides user management capability and projects or workspaces can have restricted visibility.
  • Provides interactive visualization for experiment tracking and comparison.
  • The experiment table is fully customizable within the web-based UI.
  • Segmented performance tracking helps to monitor data drifts.

Main features for model tracking and management

Experiment management

Using experiment management, comet helps to build better models using all the logged data along with improved productivity, collaboration, and explainability. Users can compare the experiments using code, hyperparameters, metrics, predictions, dependencies, system metrics. Users can record, transform, compare and visualize any artifact. 

Comet also lets users view, analyze, and gain insights from model predictions such as detecting overfitting and concept drifts. After logging a model using an experiment, users can register the model on the comet platform and can group different versions of the same model with different artifacts or some changes.

Comet: track and compare
Comet: track and compare | Source

Comet has two views for experiments – Experiment table and Experiment tab. Users can view their experiments, status, run time and visibility settings, etc and also have additional columns. In the experiment tab view, users will be able to look at each artifact – metrics, hyperparameters, chart view, output, data, etc in different tabs.

Model production monitoring

In comet, users can monitor the production models in real-time including key production metrics to identify if models are performing as expected. After training and deploying a model, data and environment tend to change and this can result in incompetent models. With the help of comet, users can monitor beyond accuracy metrics and learn about ground truth. 

Comet: model production monitoring
Comet: model production monitoring | Source

Comet also provides segmented performance tracking which ensures visibility across all the key segments. Users can integrate production monitoring with experiment management to explore how models are performing in different environments. 

Artifacts

Users can log and store different versions of data and models and update them without being worried about going back to the previous version. The purpose of comet artifacts is to allow tracking of different assets beyond any experiment. Users can create, manage and use these assets anytime in the ML pipeline. These stored artifacts will be accessible through the workspace and can re-purpose them for any other experiment. 

Comet: dataset and model versioning
Comet: dataset and model versioning | Source

3. Weight & Biases

Weight & Biases is a platform to keep a record of model training, compare and visualize different model experiments with the help of interactive graphs and tables. WandB is primarily focused on deep learning projects. 

It is one of the most popular platforms because of its customized functionalities which are not available in many tools e.g. it tells you which metrics are important and it can release any experiment early which is not performing well as expected, to save the processing hours. Using W&B users can track every part of the training process. 

Advantages over SageMaker

  • De-duplication of logged datasets.
  • Early stopping to avoid wasting expensive resources. 
  • User-friendly and interactive dashboarding, a central place to keep track of all the experiments. 
  • Visualize end to end ML pipeline flow and keep checkpoints for best performing models
  • Visualize features that are important and will have an impact on model metrics and performance
  • Regardless of the system or cloud service provider W&B provides flexible tracking and hosting of artifacts.

Main features for model tracking and management

Experiments

WandB provides a lightweight integration that works with any python script. Experiment tracking, comparing, and visualizing the models and their performances is as easy as writing 5 lines of code. Users can see model metrics live into interactive tables and graphs and compare any experiments regardless of the environment and the place where your model is being trained. Users can save code commits, hyperparameters, datasets and weights, etc on W&B directly or can link to their own storage.

Weight & Biases: visualize and compare experiments
Weight & Biases: visualize and compare experiments | Source

Users can monitor GPU and CPU usage to identify any bottlenecks in resource allocation and identify the problem areas.

Artifacts

Using WandB, users do not have to worry about keeping a record of changes in models and dataset. This platform supports saving versions of every step in the machine learning pipeline and in addition to that it automatically duplicates datasets i.e. it only saves the latest changes or new data. 

Users can also trace the flow of data with the help of incremental tracking, where WandB will preserve checkpoints in the flow for the best-performing models. It is easier to handle sensitive data with W&B and control the accessibility within a limited group.

Weight & Biases: dataset and model versioning
Weight & Biases: dataset and model versioning | Source
Sweep

WandB provides this unique functionality for hyperparameter search and model optimization. This gives a clear picture of important hyperparameters that will affect the model metrics or performance.

As WandB also tracks GPU usage and to avoid wasting expensive resources, they have implemented the Hyperband algorithm in the tool with customizable early stopping. This feature keeps the best-performing models running and kills off the rest. Users can customize sweeps and provide their own distributions for input, logic, and even time for early stopping. 

Weight & Biases: parameter importance and logic for sweeps
Weight & Biases: parameter importance and logic for sweeps | Source
Weight & Biases: parameter importance and logic for sweeps
Weight & Biases: parameter importance and logic for sweeps | Source

4. MLflow

MLflow is an open-source and library agnostic platform. Users can use MLflow with any system and can enable experiment tracking, monitoring models and creating a central repository of their models. Users can use MLflow with any machine learning library in any programming language as all the functions are accessible through REST API. These functionalities are also available through Python, R and Java APIs.

MLflow doesn’t have a UI like other available tools but to visualize results users can access MLflow UI by accessing local servers but it is not ideal when we want to collaborate. Fortunately, due to its structure, it is easy to set up a remote MLflow tracking server on any 3rd party tool such as AWS.

A real-world example of this is Databricks which has a managed version of MLflow on which ML teams work and collaborate on ML projects efficiently. 

Managed MLflow on Databricks is a fully managed version of MLflow providing practitioners with reproducibility and experiment management across Databricks Notebooks, Jobs, and data stores, with the reliability, security, and scalability of the Unified Data Analytics Platform.

Databricks documentation

Advantages over SageMaker

  • An open-source platform that helps to unify ML workflows.
  • MLflow can work with any cloud service provider.
  • Strong and easy integration with a number of open-source ML frameworks such as TensorFlow, Apache Spark etc
  • Real-time experiment tracking, which means while the code is running users can track the performance of models 

Main features for model tracking and management

MLflow tracking

With MLflow tracking component, users can easily log parameters, code versions, metrics, and output files. Users can log and query the experiments using Python, REST, R, and Java APIs from anywhere. Experiments can be logged locally or on a remote server. MLflow uses two types of storage – backend store and artifact store. 

Backend stores will keep MLflow entities such as runs, parameters, metrics, tags, notes, metadata, etc, while the artifact store contains files, models, images, in-memory objects or model summary, etc.

MLflow: comparing the models
MLflow: comparing the models | Source
MLflow model registry

MLflow model registry is a model store that helps to manage the full lifecycle of an MLflow model. From model lineage, model versioning, stage transition and annotations, model registry keeps all the information. Users will need a backend database to store and access the model registry when planning to use their own MLflow server. The model should be logged to the corresponding model flavors and only then the model registry will be accessible to add, modify, update, transition or delete using MLflow UI or API.

MLflow: registering model and model stage transitioning
MLflow: registering model and model stage transitioning | Source
MLflow: registering model and model stage transitioning
MLflow: registering model and model stage transitioning | Source

6. Kubeflow

Kubeflow is used to create machine learning workflows on Kubernetes. The idea is to let the users leverage the services on any system where kubernetes is available. Kubernetes(K8) is a container orchestration tool. With the help of containers and docker images, anyone can recreate the same environment and run the model experiments. 

It has a collection of different capabilities that will support experiment tracking, creating and managing Jupiter notebooks, and an open-source serverless framework to track ML models in real-time.

Advantages over SageMaker

  • An open-source tool for building ML applications on Kubernetes and helps to standardize ML lifecycle.
  • Great user interface for managing and tracking model experiments, jobs and runs.
  • While using the Kubeflow framework users can take benefits of other tools such as sagemaker and if needed can migrate to any other platform easily. 
  • Built-in notebook server services, where users can create and manage jupyter notebooks easily.

Main features for model tracking and management

Kubeflow pipelines

Kubeflow pipelines are available to support the end-to-end orchestration of machine learning projects. Users can run a number of experiments and can easily manage the trials or the experiments on machine learning models. Solution components from a solution can be re-used easily to create another solution without any hassle. In the Kubeflow UI, users can check the runtime execution graph of the pipeline, add or update inputs and outputs from the pipeline such as prediction results and accuracy metrics, etc. 

Kubeflow: runtime execution graph
Kubeflow: runtime execution graph | Source
Katib

Katib is a kubernetes-native project for AutoML and it supports hyperparameter tuning and early stopping of any experiment. With the help of Katib, users do not need to worry about running training jobs and manually adjusting hyperparameters to find optimal values. It also provides the validation of accuracy for various combinations of hyperparameter values. These experiments of tuning the parameters or early stopping have to be defined in the YAML configuration file for Katib to pick up and run accordingly.

Kubeflow: hyperparameters and tuning
Kubeflow: hyperparameters and tuning | Source
Central dashboard

Kubeflow central dashboard provides quick access to other Kubeflow components deployed in the cluster such as a list of notebooks, pipelines, metrics, etc. Users can easily manage notebook servers, TensorBoards, KFServing models, and experiments. Admins can integrate 3rd party applications if needed.

Kubeflow: central dashboard
Kubeflow: central dashboard | Source

7. Valohai

Valohai is the MLOps platform that helps enterprises to automate end-to-end ML pipelines. It is compatible with any programming language or framework. It helps users to run their experiments from anywhere and automatically track each of them. It can be easily set up on any cloud platform or on-premises. 

Valohai also provides complete version controls, experiment comparison and traceability. Users can deploy different model versions and monitor the performance metrics and use the logged hyperparameters and the metadata to reproduce the experiments if needed.

Advantages over SageMaker

  • Standardize workflow with complete transparency within the team.
  • One central hub for notebooks with asynchronous hyperparameter sweeps.
  • Visualize experiment comparison in graph or table format.
  • Compatible with any programming language and ML framework.

Version Control – Valohai automatically versions everything users run such as hyperparameters, library versions and hardware or resource settings. Complete version control is the only way to achieve reproducibility and regulatory compliance. Users won’t have the need to maintain separate model registries or metadata stores.

Valohai: model version control
Valohai: model version control | Source

Model Monitoring – It collects and stores everything the process internally prints out, whether it is an error stack trace, model metric, or health information. When the metrics are parsed, they can be visualised using the Valohai deployment monitoring UI, where users can analyse the logs and metrics for any chosen time range.

Valohai: model monitoring
Valohai: model monitoring | Source

Experiment Comparison – From the execution table on the platform users can select multiple experiments/execution and compare their metadata using interactive plots and tables. Users can visualise hyperparameter tuning in real-time in a largely distributed learning run easily. It is also possible to download this metadata locally and compare experiments offline.

Valohai: compare executions
Valohai: compare executions | Source

Data Dictionary – Valohai automatically tracks every data asset that is used in a machine learning model. Its tree representation of the data from the end-result all the way through every step into the original data allows you to dive into any intermediary data asset and check it out.

Valohai: auditable data lineage
Valohai: auditable data lineage | Source

Table comparison: Sagemaker vs alternatives 

So far in this article, we have discussed the alternatives of AWS SageMaker and why and when one should opt for them. Let’s have a quick look at how these tools fare against each other in different aspects so that you can make the right choice for your use case.

Open table in new window

+ Expand all – Collapse all
Overview
Price
  • Free with AWS free tier
  • Pay as per opted computing services
  • Individual: Free (+ usage above free quota)
  • Academia: Free
  • Team: Paid
  • Individual: Free (+ usage above free quota)
  • Academia: Free
  • Team: Paid
  • Individual: Free (+ usage above free quota)
  • Academia: Free
  • Team: Paid

Free

Free

Open Source
Hosted

– Only available on AWS
– On-premise solutions are limited to AWS managed hardwares

– On-premise deployment

– Your own infrastructure or your private cloud

– Cloud hosted
– Self hosted

– Cloud hosted
– Self hosted

– Cloud hosted
– Self hosted

– Cloud hosted
– Self hosted

– Cloud hosted
– Self hosted

Experiment tracking
Metrics
Hyperparameters
Input/Output artifacts
Resources Settings
Compare experiment using charts and tables
Model management
Data Quality
Limited
Model Quality
Limited
Limited
Limited
Model registry
Dataset version
Limited
Environment configuration
Limited
Limited
Limited
Limited
Limited
Jupyter Notebooks versioning
Code version

Final thoughts

When it comes to experiment tracking and model monitoring, there are plenty of tools available to choose from. AWS SageMaker is one of the most popular tools among ML engineers but it has its limitations which is why enterprises are looking for more open and easy integration tools. 

Most of these alternative tools provide a great user interface and are not restricted to one particular ML framework or cloud platform. Choosing a tool completely depends on one’s needs e.g. if a user is looking to track only the performance of an experiment they might prefer a standalone tool and if the requirement is to see through each and every step of the model pipelines, they can go for an end-to-end platform. 

Over the last few years the market of these experiment tracking and model monitoring etc has grown exponentially and the tools which are currently available have been refined further. The list of tools is not limited to only the above-mentioned tools, there are more tools like these and might suit users needs best such as Sacred, Guild AI, TensorBoard, Verta.ai, etc.

So keep on learning and happy experimenting!


READ NEXT

InstaDeep Case Study: Looking for Collaboration Features and One Central Place for All Experiments

5 mins read | Updated November 22th, 2021

InstaDeep is an EMEA leader in delivering decision-making AI products. Leveraging their extensive know-how in GPU-accelerated computing, deep learning, and reinforcement learning, they have built products, such as the novel DeepChain™ platform, to tackle the most complex challenges across a range of industries. 

InstaDeep has also developed collaborations with global leaders in the AI ecosystem, such as Google DeepMind, NVIDIA, and Intel. They are part of Intel’s AI Builders program and are one of only 2 NVIDIA Elite Service Delivery Partners across EMEA. The InstaDeep team is made up of approximately 155 people working across its network of offices in London, Paris, Tunis, Lagos, Dubai, and Cape Town, and is growing fast.

About the BioAI team

The BioAI team is the place at InstaDeep where Biology meets Artificial intelligence. At BioAI, they advance healthcare and push the boundaries of medical science through a combination of biology and machine learning expertise. They are currently building DeepChain™, their platform for protein design. They are also working with their customers in the bio sector to tackle the most challenging problems with the help of bioinformatics and machine learning.

Deepchain dashboard
DeepChain dashboard | Source

They apply the DeepChain™ protein design platform to engineer new sequences for protein targets using sophisticated optimization techniques such as reinforcement learning and evolutionary algorithms. They also leverage Language Models pre-trained on millions of protein sequences and train their own in-house protein language models. Finally, they use machine learning to predict protein structure from sequence.

Problem

Building complex software like DeepChain™, a platform for protein design, requires a lot of research with different moving parts. Customers demand various types of solutions that require new experiments and research every time. With several experiments running for different customers, it will be unavoidably daunting for a team of any size to keep track of the experiments while ensuring they remain productive.

Fazed with the thought of managing numerous experiments, Nicolas and the BioAI team encountered a series of challenges:

  • 1Experiment logs were all over the place
  • 2It was difficult to share experiment results
  • 3Machine learning researchers were dealing with infrastructure and operations
Continue reading ->
MLOps pipeline for NLP Machine Translation

Building MLOps Pipeline for NLP: Machine Translation Task [Tutorial]

Read more
Must do error analysis

5 Must-Do Error Analysis Before You Put Your Model in Production

Read more
Experiment tracking in kubeflow pipelines

Experiment Tracking in Kubeflow Pipelines

Read more
Reducing pipeline debt with great expectations

Reducing Pipeline Debt With Great Expectations

Read more