Experiments are units of progress in machine learning. However, keeping them organized and updated can be a real pain for data scientists.
Although spreadsheets and docs can be rather adaptable, it can quickly get chaotic, especially with teams.
MLflow and Neptune are two robust solutions for your machine learning needs so that you don’t have to manually track countless variables and artifacts such as:
- Parameters: hyperparameters, model architectures, training algorithms
- Artifacts: training scripts, dependencies, datasets, checkpoints, trained models
- Metrics: training and evaluation accuracy, loss
- Debug data: weights, biases, gradients, losses, optimizer state
- Metadata: experiment, trial and job names, job parameters (CPU, GPU and instance type), artifact locations (e.g. S3 bucket)
As you can see, switching to dedicated experiment tracking tools is inevitable in the long run. This article will compare both tools so you can find the best fit for you. Keep reading to find:
- A quick overview of Neptune and MLflow and how they are useful
- A detailed chart comparing the features of Neptune and MLflow
- Comparison of experiment tracking features in Neptune and Mlflow
- Comparison of UI Features in Neptune and MLflow
- Comparison of product features in Neptune and MLflow
- Comparison of integrations in Neptune and MLflow
- Neptune integration with MLflow
Quick overview of Neptune and MLflow
Although both Neptune and MLflow solve similar problems, the differences can be really significant depending on your use case.
In Neptune, you can track machine learning experiments, log metrics, performance charts, video, audio, text, record data exploration, and organize teamwork in an organic way. Neptune is fast, you can customize the UI, and manage users in an on-prem environment or on the cloud. Managing user permissions and access to projects is a breeze. It monitors hardware resource consumption, so you can optimize your code to use hardware efficiently.
Neptune has a wide range of framework integrations, so you won’t have a problem integrating your ML models, codebases, and workflows. It’s built to scale, so you won’t have any issues with your experiments getting too big.
MLflow is an open-source platform for managing your ML lifecycle by tracking experiments, providing a packaging format for reproducible runs on any platform, and sending models to your deployment tools of choice. You can record runs, organize them into experiments, and log additional data using the MLflow tracking API and UI.
Detailed chart comparing the features of Neptune and MLflow
Neptune’s flexibility when it comes to experiment tracking, framework integrations, and team collaboration, places it above MLflow.
|Pricing||– Free for individuals, non-profit and educational research
– Team: from $49
– Enterprise: from $499
|Free plan limitations||– Free: 1 user
– Unlimited private and public projects
|Open-source||Client libraries||Client libraries and server|
|Experiment Tracking Features|
|Notebook Auto Snapshots||–|
|Log Audio, Video, HTML||Limited|
|Saving Experiment Views||–|
|Scales to Millions of Runs||–|
|Dedicated User Support||–|
Detailed comparison of experiment tracking features in Neptune and MLflow
Which tool allows you to track an exploratory analysis?
In Neptune, you can version your exploratory data analysis or results exploration. After saving it, you can name, share, and download, or see differences in your notebook checkpoints.
With Neptune, you can automatically log images and charts to multiple image channels, browse through them to view the progress of your model as it trains, and get a better understanding of what’s happening in the training and validation loops.
Which tool allows you to fetch your experiment dashboard directly to a pandas DataFrame?
mlflow.search_runs()API returns your MLFlow runs in a pandas DataFrame.
Neptune allows you to fetch whatever information you or your teammates tracked and explored. Exploratory features such as HiPlot integration will help you do that.
neptune.init('USERNAME/example-project') make_parallel_coordinates_plot( metrics= ['eval_accuracy', 'eval_loss',...], params = ['activation', 'batch_size',...])
Which tool automatically snapshots your Juypter notebooks?
Neptune integrates with Jupyter notebooks, so you can automatically snapshot whenever you run a cell containing
Regardless of whether you submit your experiment, everything will be safely versioned and ready to be explored.
Which tool allows you to easily get hardware metrics?
Neptune lets you monitor hardware and resource consumption (CPU, GPU, memory) live persistently, while you train your models. With this data, you can optimize your code to utilize your hardware to the maximum.
This data is generated automatically, and you can find it in the monitoring section of the UI:
Which tool allows you to easily browse through hundreds of images and charts?
You can now browse through your images in the “predictions” tab of the “logs” section of the UI.
You can even log interactive charts that will be rendered interactively in the UI through
Detailed comparison of UI features in Neptune and MLflow
One of Neptune’s major advantages is its beautiful and intuitive UI; it is built for team collaboration, whereas MLflow is rather limited.
Which tool’s visualization dashboard is easier to set up for your entire team?
With Neptune, you can save experiment data by either backing it up on a hosting server, or an on-prem installation. You can easily share experiments with no overhead.
However, MLflow stores and tracks experiments locally, limiting user management and team setup capabilities.
Which tool has better user management?
Neptune gives your full control over user and access permissions. You can limit or grant viewing and editing capabilities by assigning different roles such as owner, contributor, or viewer.
Inviting team members is as simple as an email invitation:
MLflow, however, does not offer any features for user management.
Which tool has better experiment organization?
After logging your experiments, you can easily organize them in the Neptune dashboard.
You can view everything that was logged:
- click on the experiment link or one of the rows in the experiment table in the UI
- Go to Logs section to see your metrics
- Go to Source code to see that your code was logged
- Go to Artifacts to see that the model was saved
You can filter experiments by tag view the experiment space and go to simple search button:
You can select parameter and metric columns using the manage columns button:
You can save the view of experiment tables for later use:
You can group experiments by feature:
This can especially be useful when systematically inspecting relations between model parameters and score.
And you can also share view links with teammates.
The only way to access this rich UI and easy collaboration with MLflow is through integrating it with Neptune.
Detailed comparison of product features in Neptune and MLflow
Which tool supports thousands of runs?
Neptune was built to scale in order to support millions of experiment runs, both on the frontend and backend.
MLflow, as an open-source tool, isn’t the fastest tool out there; especially with 100’s or 1000’s of runs, the UI can get laggy.
Detailed comparison of integrations in Neptune and MLflow
Both Neptune and MLflow provide a variety of integrations. Neptune allows integration with many popular data science tools such as Scikit-Learn, TensorBoard, Sacred, Catalyst, Scikit-Optimize, Ray, HiPlot, etc. MLflow’s integrations are TensorFlow, PyTorch, Keras, Apache Spark, etc. Scikit-Learn, Rapids, etc.
Which tool allows integration with TensorBoard?
You can integrate Neptune with Tensorboard to have your Tensorboard visualization hosted in Neptune, convert your Tensorboard logs directly into Neptune experiments, and instantly log major metrics.
First, install the library:
pip install neptune - tensorboard
After creating a simple training script with Tensorboard logging and initializing Neptune, you can integrate with two simple lines:
import neptune_tensorboard as neptune_tb neptune_tb.integrate_with_tensorflow()
Make sure to create the experiment!
Now, your experiments will be logged to Neptune, and you can also enjoy the features of team collaboration.
Neptune integration with MLflow
As we mentioned before, one of the disadvantages of MLflow is that you can’t easily share experiments, nor collaborate on them.
In order to add organization and collaboration, you need to host the MLflow server, confirm that the right people have access, store backups, and jump through other hoops.
The experiment comparison interface is a little lacking, especially for team projects.
But you can integrate it with Neptune. This way, you can use the MLflow interface in order to track experiments, sync your runs folder with Neptune, and then enjoy the flexible UI from Neptune.
You don’t need to back up the mlruns folder or fire up the MLflow UI dashboard on a dedicated server. Your MLflow experiments will automatically be hosted, backed up, organized, and enabled for teamwork thanks to Neptune.
Change the workflow from:
You can do everything else as you normally would.
Learning more about Neptune…
As you can see, these tools aren’t necessarily mutually exclusive. You can benefit from your favorite features of MLflow, while using Neptune as a central place for managing your experiments and collaborating on them with your team.If you want to learn the details about Neptune, check out the official documentation. If you want to try it out, you can create a free account and start tracking your machine learning experiments with Neptune.