If you use MLflow, you’re in for a treat! Because, in this article, you’ll see how to make your MLflow projects much easier to share, and enable seamless collaboration with your teammates.
Creating a seamless workflow for your machine learning projects can be extremely challenging.
A typical machine learning lifecycle includes:
- Data collection + preprocessing
- Training the model on data
- Deploying the model to production
- Testing + improving the model with new data
These four steps seem fairly straightforward, but each layer comes with new obstacles. You might need to use different tools for each step – Kafka for data prep, Tensorflow as a model training framework, Kubernetes as a deployment environment, etc.
Every time you use a new tool, you must repeat the entire process, perhaps by running the same funnel through Scikit-learn and deploying to Amazon SageMaker. This is obviously not sustainable as APIs and organizations expand.
Plus, tuning hyperparameters is vital for creating an extraordinary model; there should be a thorough record of hyperparameter history, source code, performance metrics, dates, persons, and more. The machine learning lifecycle can be a formidable platform development challenge: you should be able to reproduce, revisit, and deploy your workflow to production easily, and you also need a platform that standardizes the lifecycle.
Luckily, there’s MLflow, which is a great open-source solution built around 3 pillars: tracking, projects, and models.
- MLflow Tracking
Create an extensive logging framework around your model, assign specific metrics to compare runs.
- MLflow Projects
Create an MLflow pipeline to determine how the model would run on the cloud.
- Mlflow Models
Package your machine learning models in a standard format for use in various downstream tools. For example, real-time serving with a REST API, or batch inference with Apache Spark.
MLflow enables reproducibility and scalability for large organizations. The same model can execute in the cloud, locally, or in a notebook. You can work with any ML library, algorithm, deployment tool or language, and you can also add and share previous code.
But, there’s something that MLflow doesn’t have: an easy way to organize work and collaborate.
You would need to host an MLflow server, painstakingly organize team member access, store backups, and more. Plus, MLflow’s UI, the MLflow Tracking Module that lets you compare experiments is not easy to use at all, especially for large teams.
Not to worry! We can use Neptune AI to solve this problem.
Neptune’s intuitive UI lets you track experiments and collaborate with teammates, while also keeping your favorite parts from MLflow.
Introducing Neptune & MLflow integration
Neptune is a lightweight ML experiment management tool. It’s flexible and easy to integrate with all types of workflows. Your teammates can use different ML libraries and platforms, share results and collaborate on a single dashboard with Neptune. You can even use their web platform, so you don’t have to deploy it on your own hardware.
Neptune’s main features are:
- Experiment Management: keep track of all your team’s experiments, also tag, filter, group, sort, and compare them
- Notebook versioning and diffing: compare two notebooks or checkpoints in the same notebook; similarly to source code, you can do a side-by-side comparison
- Team Collaboration: add comments, mention teammates, and compare experiment results
Neptune and MLflow can be integrated with one simple command:
Now, you can push all these MLrun objects to a Neptune experiment:
- Experiment id + name
- Run id + name
Organization and collaboration With Neptune
Now let’s walk through how you will be able to share and collaborate on experiments from MLflow through Neptune’s beautiful and intuitive UI.
Neptune setup (skip if you already have a Neptune account)
1. Sign up for a Neptune AI account first. It’s free for individuals and non-organizations, and you get a generous 100 GB of storage.
2. Get your API token by clicking the top right menu.
- Create a NEPTUNE_API_TOKEN environment variable and run it in your console.
4. Create a project. In your Projects dashboard, click “New Project” and fill in the following information. Pay attention to the privacy settings!
Sync Neptune and MLflow
First install Neptune-MLflow:
pip install neptune-mlflow
Next, setyour NEPTUNE_PROJECT variable to USER_NAME/PROJECT_NAME:
Finally, sync your mlruns directory with Neptune:
Collaborate with Neptune
Your experiment metadata should now be stored in Neptune, and you can view it in your experiment dashboard:
You can customize the dashboard by adding tags and grouping experiments with custom filters.
Neptune lets you share ML experiments simply by sending a link. It can be:
Neptune also comes with workspaces, a central hub where you can manage projects, users, and subscriptions; there are individual and team workspaces.
In the team workspace, team members can browse the content that’s related to their assigned role. You can assign various roles in projects and workspaces. In a team workspace, you can invite people either as admin or member, each with different privileges.
Workplace settings can be changed in the workspace name on the top bar:
Under the Overview, Projects, People and Subscription tabs, you can see workplace settings:
There are three roles in a project: owner, contributor, and viewer. Depending on the role, users can run experiments, create notebooks, modify previous stored data, etc.
For more details, see -> User Management
Learning more about Neptune
As you see, MLflow and Neptune aren’t mutually exclusive. You can keep your favorite features from MLflow, while using Neptune as a central place for managing your experiments and collaborating on them with your team.
Zoined Case Study: Open-source or Paid Hosted Solution?
6 mins read | Updated October 25th, 2021
Zoined offers Retail and Hospitality Analytics as a cloud-based service for different roles from top management to manager level. The service collects sales data from stores and venues including inventories, time and attendance, and visitor tracking systems as well as webstores. The data is analyzed and presented in a very accessible, visual format for business owners so they can get real-time, actionable insights for their business and select the preferred time frames they want to report on. The product also allows businesses to filter and group their data easily and create custom views, grasp trends quickly with charts and graphs.
With Zoined® businesses have access to a fully-managed, off-the-shelf solution with ready-made dashboards and analytics for retail and wholesale, especially fashion, food retail, coffee shops’, and restaurants’ needs.
Running lots of experiments, especially in a start-up with few scientists and engineers solving problems, can get daunting. Tracking the experiments, versioning the datasets as they inevitably get larger, and generally taking procedures to get reproducible results can be very tricky to navigate. This was the problem Kha faced when he first joined Zoined.
“When I joined this company, we were doing quite many different experiments and it’s really hard to keep track of them all so I needed something to just view the result or sometimes or also it’s intermediate results of some experiments like what [does] the data frame look like? What [does] the CSV look like? Is it reasonable? Is there something that went wrong between the process that resulted in an undesirable result? So we were doing it manually first but just writing… some log value to some log server like a Splunk.”