We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

The Best Software for Collaborating on Machine Learning Projects

Collaborating on machine learning projects is challenging. It requires focus, attention to detail, and strong analytical skills. But it also requires tools.

When you’re working on a project solo, you have full flexibility when it comes to the work style. So what to do when you’re collaborating with your team? It’s possible to maintain flexibility but it’s impossible to do it without the right software.

There are many tools for collaboration on machine learning projects but not all will enhance work. The secret lies in features.

Here are 9 of the best software that will make the life of people working on ML projects easier. 

All these collaboration platforms are used by Machine Learning and Data Science practitioners and can be easily incorporated into the process of Agile project management. So if you want to improve the collaboration in your ML team, you’re in the right place. 

Note: If you’re interested in the topic of Data Science collaboration, check this paper “How do Data Science Workers Collaborate? Roles, Workflows, and Tools”.

1. Neptune – collaboration for Machine Learning and Data Science

Neptune is a metadata store used by individuals and teams for experiment tracking and model registry. It gives them a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle. 

With Neptune on board, ML teams can work on their models collaboratively, keep track of their experiments and have the historical data stored in one app. The process is well-defined, well-structured, and most importantly, standardized.  

One thing that probably interests you most, if you’re part of the ML team, Neptune is also very easy to set up. The API is infrastructure agnostic, so it will nicely fit any workflow that you already have (or plan to build). A few lines of code will do actually, and 25+ integrations with ML libraries make that even easier. 

“For me the most important thing about Neptune is its flexibility. Even if I’m training with Keras or Tensorflow on my local laptop, and my colleagues are using fast.ai on a virtual machine, we can share our results in a common environment.”

Víctor Peinado, Senior NLP/ML Engineer

Here are a few features of Neptune that especially enhance team collaboration: 

  • User management functionality, including roles differentiation and access management,
  • Share buttons that let you copy, email, or tweet a link to any page in the Neptune UI
  • Possibility to share UI links with both project members and external people, 
  • Usage-based pricing scheme that lets you add as many members to the team workspace as you want, without it affecting the base fee
  • Scalability with thousands of runs

➡️ Alternative tools: Comet, Weight & Biases.

2. GitHub – software development platform

GitHub project management

GitHub is the most popular platform built for developers. It’s used by millions of teams around the globe as it allows for easy and painless collaboration. With GitHub, you can host and review code, manage projects, and build software.

It’s a great platform for teams collaborating on machine learning projects who want to simplify workflow and share ideas conveniently. GitHub lets teams manage ideas, coordinate work, and stay aligned with the entire team to seamlessly collaborate on machine learning projects.

Here are some of the main features you and your team will find helpful:

  • Build, test, deploy, and run CI/CD the way you want in the same place you manage code
  • Use Actions to automatically publish new package versions to GitHub Packages. Install packages and images hosted on GitHub Packages or your preferred registry of record in your CI/CD workflows
  • The software lets you secure your work with vulnerability alerts so you can remediate risks and learn how CVEs affect you
  • The build-in review tools make it easy and convenient to review code – a team can propose changes, compare versions, and give feedback
  • GitHub easily integrates with other tools for smooth work, or you can create your own tools with GitHubGraphQL API

GitHub is a platform where all the documentation is easily accessible, and all the features make it a unified system for flexibly developing software.

“In my experience, I have found GitHub helps a lot in collaboration with team members for an ML project. It gives a common platform where we can do version controlling of our experiments, write documentations, create branches and forks to avoid accidents. Also, the availability of a large ML projects repository in GitHub helps in the quick implementation of code/debugging.”

Nishkam Shivam, Data Scientist at Bristlecone

➡️ Alternative tools: GitLab

3. Jira – project management for agile teams

Jira - tracking ML projects

Jira is a fantastic software for agile teams as it allows for fully-encompassed project management. It’s an issue and project tracking tool so teams can plan, track, and release their product or software as a perfectly developed ‘organism’.

Jira allows for flexible workflow automation. You can freely manage a project by assigning certain tasks to people, bugs to programmers, create milestones, or plan to carry certain tasks within a  specific timeframe.

Products and apps built on top of the Jira platform help teams plan, assign, track, report, and manage work. Four products are built on the Jira platform: Jira Software, Jira Service Desk, Jira Ops, and Jira Core. Each product comes with built-in templates for different use cases and integrates seamlessly, so teams across organizations can work better together.

“We use GitHub and Jira. GitHub is the place where we maintain all our code/repos and since we follow an agile model, we use Jira to maintain our stories, and also created an API which can directly connect our GitHub code to Jira (in case anyone wants to see).”

Akshat Shreemali, Principal Data Scientist at Capital One

Jira is a great solution for collaboration for programmers, analysts, and software architects and all the team of people developing software. It helps to simplify, organize, and structure workflow.

➡️ Alternative tools: Trello, ClickUp, Asana

4. Slack – online chat compatible with other apps

Slack dashboard

Slack is one of the most popular apps for communication. It can also enhance the work of people working on machine learning projects.

Slack is the e-mail turned into messages with people, information, and tools in one place. 

By combining people, applications, and data, it effectively replaces e-mail and long, messy threads. Additionally, it lets people see a bigger picture so everyone can see what’s happening within a company and stay in the loop.

Slack makes it simple to follow conversations or find important information in an easily searchable archive. So when a team works on an ML project, they always know what’s happening and can streamline processes.

Summary of Slack main functionalities:

  • Workspaces – you can create multiple workspaces for different projects or teams
  • Channels – private or public, shared with clients let you quickly communicate with other people
  • Direct and group messages
  • Company’s key info – threads, mentions and reactions, saved items, and people are at your hand so you can quickly navigate in the app
  • Apps – an inventory of numerous tools lets you integrate Slack with other popular tools
  • Files – a space where you and your team and the entire company can store files and easily access them with a search option or upload a new file
    save drafts
  • History and search allow for quick and easy search for the things you are looking for
  • Video calls with screen sharing
  • Create workflows that automate routine actions and communication
  • Messages and files can be encrypted and you can contact the sales team for more security options

Slack is a powerful tool for communication and collaboration and its smallish features allow teams to work in the agile style.

5. Notion.so – all-in-one workspace

Notion is a collaboration tool that lets you write, plan, collaborate, and organize teamwork.

It consists of four modules, each with different functionalities:

  1. Notes, Docs – text editor which serves as a space for files, notes of different formats; you can add images, bookmarks, videos, code, and many more
  2. Knowledge Base – in this module, teams can store knowledge about projects, tools, best practices, and other aspects that are necessary for developing machine learning projects
  3. Tasks, Projects – tasks and projects can be organized in a Kanban board, calendar, and list views
  4. Databases – this module can effectively replace spreadsheets and keep records of important data and unique workflows in a convenient way

Additionally, every team member can use Notion for personal use to keep a record of work-related activities and information, for example, weekly agenda, goal, task list, or personal notes.

Other smallish features include #markdown. /Slash commands, drag-and-drop feature, comments and discussions, and integrations with 50+ popular apps such as Google Docs, Github Gist, CodePen, and more

All modules create a coherent system that serves as a unified hub for work management and project planning. It’s a lightweight tool suitable for agile teams.

“Notion is part of the process of research and experimentation. Part of the experiments documentation is kept there. Jupyter Notebooks are the main IDE we use during the research process as it eases results visualisation and knowledge sharing.

Github, we use it mainly for product development. After there’s a research answer and we are confident that a development should be integrated into the main product, that’s when we use Github – from version control to git pipelines.

Jira we use it to define the both research roadmap and the product itself.

Slack and Zoom are chat and video support for any exchange of ideas or in-depth discussion.”

Fabiana Clemente, Founder and Chief Data Officer at YData

6. Confluence – product documentation workspace

Confluence is a collaborative workspace developed by Atlassian. It has been around for a long time (17 years) and has been improvised accordingly to fit the changing needs of the community. Today confluence is primarily used as a workspace for remote collaboration.

It supports the maintenance of a single source of truth and allows users to quickly find specific information with advanced search capabilities. Confluence has several collaborative features which make it a super useful tool for corporate collaboration. Some of them include real-time editing, chats, notifications, comments, notes, tags, page trees, project hierarchy, and access controls.

Confluence has a customer base of over 75,000 customers and continues to grow. Confluence fits really well in the ecosystem of collaborative tools. Being integrable with highly used applications such as Slack, Jira, Dropbox, G-Drive, and Trello, makes it a popular choice.

“The teams I have worked with usually use Gitlab for hosting repositories. I have also used JIRA, Trello and Azure Board for project management, companies or teams usually select one of these tools. I’ve also used Confluence for documentation, but recently I’ve been using licensed versions of Dropbox or Box, depending on the company selection. For communication, I’ve used many tools, including Slack, Matterport and company-built chat applications.

My previous team has used Tableau and other similar dash boarding applications as well. We use Docker quite extensively to develop and deploy containerized applications, and commercial cloud platforms like Azure/AWS for compute.”

Tanmana Sadhu, AI Researcher @ LG Toronto AI Lab

7. Google Meet – real-time meetings

Google Hangouts Meet
Source

Google Meet is one of the most popular e-meeting platforms. It’s especially helpful for remote teams collaborating on machine learning projects. Google Meet lets teams meet for sprints, conferences, individual chats, or any other form that is preferred. 

Using your browser, you can share your video, desktop, and presentations with teammates and customers.

Google Meet integrates with G Suite’s Google Calendar and Gmail so you can see the complete list of participants and scheduled meetings.

You can easily and quickly schedule a meeting and set it to team members way ahead so they can join it later with an auto-generated link.

If in your company you use G Suite, you can access meetings by dialing in a phone number of a given meeting. And it can be used solely for audio meetings or video conferencing as it lets users turn off the camera or mute the microphone so people can focus on a person who is speaking.

➡️ Alternative tool: Zoom

8. Jupyter Notebooks – collaboration for research

Even if you are barely associated with the AI/ML community, you would have worked with or at least heard of Jupyter Notebooks. It is especially popular among beginners who are just getting started with their machine learning journey but is equally used as a powerful collaborator tool by experts.

A Jupyter notebook is a web application where the user can not just code but also maintain extremely detailed documentation with visuals, results, time details, sequences, logs, and much more. In a way, it promotes the best type of coding there is – the one that comes with documentation. 

Jupyter Notebooks support over 40 languages and is popularly leveraged for Python. It also supports ML-friendly languages such as R, Matlab, and Scala, and popular programming languages such as Java and C++.

Being easy to use and understand, Jupyter Notebooks are the number one choice for breaking down complex processes for the uninitiated, creating user-friendly tutorials for AI/ML platforms, or for documenting the experiments.

➡️ Alternative tools: Google Colab, Deepnote

9. Google Docs – collaboration in real-time across organization

Google Docs is one of the best solutions for organizing files and collaborating on them in real-time, no matter how many users are working on the file.

Even I am writing this article using Google Docs, so other people taking part in its publication can easily collaborate on it.

Google Docs has useful features that facilitate the work of teams:

  • Create as many files as necessary, adjust them as you like
  • Use templates from the template gallery
  • Add-ons allow users to customize documents for more flexible and effective work
  • Smart editing and styling tools help easily format text and paragraphs – you can choose from hundreds of fonts, add links, images, and drawings
    Team members can access, create, and edit documents from a phone, tablet, or computer
  • Everyone can work together in the same document at the same time
    All changes are automatically saved – there is no need to save files locally on the computer or other device
  • You can use revision history to see old versions of the same document, sorted by date and who made the change.

Google Docs is a must-have tool for collaborating on machine learning projects.

To wrap it up

When choosing software for collaboration on ML projects, make sure it corresponds to your team’s preferences, style of work, and can be easily integrated with the apps you currently use.

It’s the best way to ensure coherency, seamless workflow, and security of projects.

Happy collaborating!


READ NEXT

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

7 mins read | Tymoteusz Wołodźko | Posted March 31, 2021

GreenSteam is a company that provides software solutions for the marine industry that help reduce fuel usage. Excess fuel usage is both costly and bad for the environment, and vessel operators are obliged to get more green by the International Marine Organization and reduce the CO2 emissions by 50 percent by 2050.

Greensteam logo

Even though we are not a big company (50 people including business, devs, domain experts, researchers, and data scientists), we have already built several machine learning products over the last 13 years that help some major shipping companies make informed performance optimization decisions.

MLOps shipping

In this blog post, I want to share our journey to building the MLOps stack. Specifically, how we:

  • dealt with code dependencies
  • approached testing ML models  
  • built automated training and evaluation pipelines 
  • deployed and served our models
  • managed to keep human-in-the-loop in MLOps
Continue reading ->
Experiment tracking Experiment management

15 Best Tools for ML Experiment Tracking and Management

Read more
Best 8 ML Model Deployment Tools That You Need to Know

Best 8 Machine Learning Model Deployment Tools That You Need to Know

Read more
MLOps guide

MLOps: What It Is, Why It Matters, and How to Implement It

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more