MLOps is not a piece of cake. Especially in today’s changing environment. There are many challenges—construction, integrating, testing, releasing, deployment, and infrastructure management. You need to follow good practices and know how to adjust to the challenges.
And if you don’t learn and develop your knowledge, you’ll fall out of the loop. The right resources can help you follow the best practices, discover helpful tips, and learn about the latest trends.
You don’t have to look far, we’ve got you covered! Here’s your list of the best go-to resources about MLOps divided into categories—books, articles, podcasts, and more. Let’s dive in!
This course is curated by AI pioneer Andrew NG, one of the most influential computer scientists in the world, who co-founded Google Brain, and Coursera, and led AI research at Baidu. His Machine Learning specialization course is still considered to be the best in business for beginners.
The Machine Learning Engineering for Production (MLOps) Specialization covers how to conceptualize, build, and maintain integrated systems that continuously operate in production. In striking contrast with standard machine learning modeling, production systems need to handle relentless evolving data. Moreover, the production system must run non-stop at the minimum cost while producing maximum performance.
In this specialization:
- You will learn how to use well-established tools and methodologies for doing all of this effectively and efficiently.
- You will become familiar with the capabilities, challenges, and consequences of machine learning engineering in production.
- By the end, you will be ready to employ your new production-ready skills to participate in the development of leading-edge AI technology to solve real-world problems.
If you are looking to build your MLOps expertise around a certain platform such as Google Cloud Platform (GCP), this might be the course for you. This course introduces participants to MLOps tools and best practices for deploying, evaluating, monitoring, and operating production ML systems on Google Cloud.
This course is primarily intended for:
- Data Scientists looking to quickly go from machine learning prototype to production to deliver business impact.
- Software Engineers looking to develop Machine Learning Engineering skills.
- ML Engineers who want to adopt Google Cloud for their ML production projects.
You will learn to:
- Identify and use core technologies required to support effective MLOps.
- Adopt the best CI/CD practices in the context of ML systems.
- Configure and provision Google Cloud architectures for reliable and effective MLOps environments.
- Implement reliable and repeatable training and inference workflows.
Additionally, you can check the official website of MLOps for more interesting information and resources to expand your knowledge and learn about the best practices.
The Stanford MLSys Seminar Series is, as the name suggests, a series of seminars focused on machine learning and ML systems—tools and all the technology used for programming machine learning models.
The course started in fall 2020. Every talk is live-streamed in this seminar series Thursdays 1-2 PT on YouTube. You can ask questions on the live chat. Videos of the talks are available on YouTube afterward as well. Give the channel a follow here, and tune in every week for an exciting discussion!
The goal of the course is to help curate a curriculum of awesome work in ML systems to help drive research focus to interesting questions.
1. Introducing MLOps from O’Reilly
Introducing MLOps: How to Scale Machine Learning in the Enterprise is a book written by Mark Treveil and the Dataiku Team (collective authors). It introduces the key concepts of MLOps, shows how to maintain and improve ML models over time, and tackles the challenges of MLOps.
The book was written specifically for analytics and IT operations team managers—the people directly facing the task of scaling machine learning (ML) in production. It’s a guide for creating a successful MLOps environment, from the organizational to the technical challenges involved.
The book is divided into three parts:
- An introduction to the topic of MLOps, how and why it has developed as a discipline, who needs to be involved to execute MLOps successfully, and what components are required.
- The second part follows the machine learning model life cycle, with chapters on developing models, preparing for production, deploying to production, monitoring, and governance.
- Provides tangible examples of how MLOps looks in companies today, so readers can understand the setup and implications in practice.
2. What Is MLOps? from O’Reilly
What Is MLOps? Generating Long-Term Value from Data Science & Machine Learning by Mark Treveil and Lynn Heidmann is a thorough report for business leaders who want to understand and learn about MLOps as a process for generating long-term value while reducing the risk associated with data science, ML, and AI projects.
Here’s what the report includes:
- Detailed components of ML model building, including how business insights can provide value to the technical team
- Monitoring and iteration steps in the AI project lifecycle–and the role business plays in both processes
- How components of a modern AI governance strategy are intertwined with MLOps
- Guidelines for aligning people, defining processes, and assembling the technology necessary to get started with MLOps.
3. Practical MLOps from O’Reilly
The book Practical MLOps: Operationalizing Machine Learning Models by Noah gift and Alfredo Deza is an insightful guide that takes you through what MLOps is, how it differs from DevOps, and shows you how to put it into practice to operationalize your machine learning models.
This is what you’ll learn from the book:
- Apply DevOps best practices to machine learning
- Build production machine learning systems and maintain them
- Monitor, instrument, load-test, and operationalize machine learning systems
- Choose the correct MLOps tools for a given machine learning task
- Run machine learning models on a variety of platforms and devices, including mobile phones and specialized hardware.
Here’s the free version on GitHub. It’s a public repo where code samples are stored for the book Practical MLOps.
4. Reliable Machine Learning from O’Reily
Whether you are part of a small startup or a planet-spanning megacorp, this practical book shows data scientists, SREs, and business owners how to run ML reliably, effectively, and accountably within your organization. You’ll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization.
By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guests show you how to run an efficient ML system. Whether you want to increase revenue, optimize decision-making, solve problems, or understand and influence customer behavior, you’ll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind.
Specifically, you’ll examine:
- What ML is: how it functions and what it relies on?
- Conceptual frameworks for understanding how ML “loops” work.
- Effective “productionization,” and how it can be made easily monitorable, deployable, and operable.
- Why do ML systems make production troubleshooting more difficult, and how to get around them?
- How ML, product, and production teams can communicate effectively?
5. Designing Machine Learning Systems from O’Reilly
Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they’re data dependent, with data varying wildly from one use case to the next. In this book, you’ll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.
Author Chip Huyen, the co-founder of Claypot AI, considers each design decision–such as how to process and create training data, which features to use, how often to retrain models, and what to monitor–in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.
This book will allow you to tackle scenarios such as:
- Engineering data and choosing the right metrics to solve a business problem.
- Automating the process for continually developing, evaluating, deploying, and updating models.
- Developing a monitoring system to quickly detect and address issues your models might encounter in production.
- Architecting an ML platform that serves across use cases.
- Developing responsible ML systems
6. Machine Learning Engineering by Andriy Burkov
“If you intend to use machine learning to solve business problems at scale, I’m delighted you got your hands on this book.” —Cassie Kozyrkov, Chief Decision Scientist at Google
This book has received glaring reviews from several industrial stalwarts for its focus on hugely important topics in practical Machine Learning. It does a great job of:
- Outlining the importance of monitoring
- How to approach model maintenance?
- What to do when things go wrong?
- How to think about fallback strategies for the kinds of mistakes you can’t anticipate?
- How to deal with adversaries who try to exploit your system?
- and how to manage the expectations of your human users?
It offers a comprehensive review of machine learning engineering best practices and design patterns, so if you are someone who is looking to bring more structure to their work and find solutions pertaining to reproducibility, scalability, and model version control, this book is for you.
That’s definitely the best MLOps community out there. Almost 10k practitioners in one place, asking questions, sharing knowledge, and just talking to each other about all things MLOps.
While MLOps shares a lot of ground with DevOps, the differences are as big as the similarities. We needed a community laser-focused on solving the unique challenges we deal with every day building production AI/ML pipelines. We’re in this together. Come learn with us in a community open to everyone. Share knowledge. Ask questions. Get answers.—https://mlops.community/
Apart from the very active Slack channel, MLOps Community also runs a podcast (more about it below), organizes meetups and reading groups, and sends newsletters. Make sure to check all these resources.
To address the challenges of CI/CD in ML, the MLOps SIG managed under the CD Foundation has been formed with the following goals:
- MLOps Definition and Roadmap: Create a vision and roadmap for MLOps, what it means, and what is its role within the CI/CD ecosystem.
- Reference Architecture and Design Patterns: Create reference architecture, design patterns and implementations, and processes for MLOps.
- AI Governance and Risk Management: Define architecture and guidelines around lineage tracking, metadata collection, experiment tracking, data versioning, ETL operations, etc. which a typical Data and ML Pipeline shall support to enable Ethical AI
You can join the #sig-mlops channel on their Slack community.
This is a show/podcast run by Neptune.ai. It’s a biweekly Q&A where practitioners doing ML at a reasonable scale answer questions from other ML practitioners. We started it because after talking to many ML teams, we realized that they all want to learn more about how teams like them solve their problems. And it wasn’t easy to find this kind of resource. So we figured we could try and change that.
But we wanted to do it in a really practical way. Presentations are great, but sometimes you almost want to skip to the Q/A section right away. There’s never enough time for that, right? And it’s usually the most interesting part of the talk.
So we’ve decided to make the whole thing a live Q/A. Every episode is focused on a different subject related to production ML and is full of juicy bits, the things you won’t find in a company blog post. Things that you only get when practitioners are asking practitioners.
You can register for these events here and if you want to catch up with previous episodes, you can do it on:
I recommend you to start with this episode:
2. MLOps.community Podcast
MLOps.community is a podcast hosted by Demetrios Brinkmann. It has “weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.”
There are interviews, conversations with interesting people, tips, talks about challenges, trends, and more. Tune in and listen!
There are tons of great episodes, so it’s difficult to pick the best ones, but the one episode you definitely should catch up with is this one:
3. TWIML Podcast
This Week in AI and ML is a leading voice in the field, with over seven million downloads and a large and engaged community following. Through its podcast, it aims to bring the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers, and tech-savvy business and IT leaders.
The TWIML AI Podcast is hosted by Sam Charrington, a sought-after industry analyst, speaker, commentator, and thought leader. Sam’s research is focused on the business and consumer application of machine learning and AI, bringing AI-powered products to market, and AI-enabled and -enabling technology platforms.
You can check all the episodes here, and here is one that we recommend you to listen first:
4. Pipeline Conversations ML podcast by ZenML
Pipeline Conversations is a fortnightly podcast bringing you interviews and discussions with industry leaders, top technology professionals, and others. It discusses the latest developments in machine learning, deep learning, and artificial intelligence, with a particular focus on MLOps, or how trained models are used in production.
It first aired in November 2021 and has 15 episodes so far on topics ranging from productionizing your ML model to monitoring it.
Listen to this one for starters:
5. The MLOps Podcast by DAGsHub
This podcast aims to bring machine learning into the mainstream. Each episode features a conversation with top data science and machine learning practitioners, who share their thoughts, best practices, and tips for promoting machine learning to production.
It has 9 episodes so far, primarily focused on good MLOps practices and building and deploying impactful models.
Check this one for example, a really cool episode:
1. MLOps World
MLOps World is an international community group of practitioners trying to better understand the science of deploying ML models into live production environments. Currently, they have 5 chapters spanning from North America to Europe. These chapters organize annual conferences, events, and meetups where the participants get an opportunity to indulge in various workshops and listen to ML executives from leading tech companies such as Microsoft, Amazon, Lyft, etc.
Created by the Toronto Machine Learning Society (TMLS) this initiative is intended to unite and support the wider AI Ecosystem, companies, practitioners, academics, and contributors to open-source communities operating within it. Their activities include, but are not limited to –
- Holding regular social gatherings
- Knowledge sharing and career development
- Identification of opportunities and effective practices, methodologies, and principles around deploying models into live production environments
- Unique local and global partnerships
- Hiring and talent building
- Growth of diverse and inclusive teams
“We launched our apply() event series last year to meet the demand for more practical knowledge from MLOps teams that are deploying ML in production.” – Mike Del Balso, co-founder and CEO of Tecton.
apply() is a series of events planned by Tecton on data engineering for applied machine learning. The latest in this line of events was apply(conf), a practitioner-focused conference to discuss challenges faced by ML engineers when building and deploying machine learning models. The topics included –
- Best practices development patterns
- Tooling and infrastructure of choice
- Managing labeling pipelines
- Transforming and serving features in real-time
- Serving at scale
You can access the conference’s session and video archive here.
Their events comprise workshops and in-person social events to maximize learning and networking opportunities. You can join their slack community to keep a track of their upcoming meetups and conferences.
Other MLOps resources
1. ML-Ops.org & Awesome MLOps
ML-ops.org website serves as a guide to MLOps, explaining various concepts such as the principles behind it, the components that constitute the MLOps pipeline, various tools and frameworks for managing this pipeline, and much more. With high-quality graphics and carefully curated content, it is a valuable resource for anyone looking to get started with MLOps. The sheer number of references listed on this website makes it even more useful.
Awesome MLOps is a list of references for MLOps created by the people behind ml-ops.org
It’s a list of links to numerous resources, beginning with books, articles, to communities, and many, many more. In a word—it has everything you could possibly read about MLOps. The table of contents includes, among others: MLOps Papers, Talks About MLOps, Existing ML Systems, Machine Learning, Software Engineering Product Management for ML/AI, The Economics of ML/AI, Model Governance, Ethics, Responsible AI.
Be careful! It might be a tough and long read if you want to go through all the links, but if you want to learn all about MLOps, it’s one of the best resources.
2. Google Cloud
MLOps: Continuous delivery and automation pipelines in machine learning is a document from Google that “discusses techniques for implementing and automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for machine learning (ML) systems.”
If you’re new to MLOps, this document can be a great source of knowledge as it touches on some basic concepts. But if you’re an MLOps veteran, you’ll also find it helpful to refresh and solidify your knowledge. It can also help reliably build and operate ML systems at scale.
This is a great resource if you are looking to get started with MLOps on the Azure stack. It is somewhat similar to the last resource offered by Google Cloud, however, rather than a course it is more of a Do-It-Yourself. This project aims to –
- Let you know about a number of asset management and orchestration services offered by Azure to help you manage the lifecycle of your model training & deployment workflows.
- Best practices for training and operationalizing your model with Microsoft Azure.
- Offer real-world examples to help you get started with the same.
If you are someone who likes to read about anything directly from the source, here is a very useful link for you. It contains all the scientific and industrial research papers concerning Machine Learning Operations since 2015, discussing everything from technical debt to engineering entire ML systems, the right way.
To wrap it up
MLOps is important if you want to build high-quality models for your ML experiments. It can boost your production pipeline and improve the team’s performance. So keep learning to implement the best practices and always stay on top!
And don’t forget to follow our blog for the latest articles. We’re publishing regularly to bring you the best content. We also have a section dedicated to MLOps, you can find it here.
Are we missing something? Let us know if you’re not seeing your favorite resources on our list!