MLOps is not a piece of cake. Especially in today’s changing environment. There are many challenges—construction, integrating, testing, releasing, deployment, and infrastructure management. You need to follow good practices and know how to adjust to the challenges.
And if you don’t learn and develop your knowledge, you’ll fall out of the loop. The right resources can help you follow the best practices, discover helpful tips, and learn about the latest trends.
You don’t have to look far, we’ve got you covered! Here’s your list of the best go-to resources about MLOps divided into categories—books, articles, podcasts, and more. Let’s dive in!
1. Introducing MLOps from O’Reilly
Introducing MLOps: How to Scale Machine Learning in the Enterprise is a book written by Mark Treveil and the Dataiku Team (collective authors). It introduces the key concepts of MLOps, shows how to maintain and improve ML models over time, and tackles the challenges of MLOps.
The book was written specifically for analytics and IT operations team managers—the people directly facing the task of scaling machine learning (ML) in production. It’s a guide for creating a successful MLOps environment, from the organizational to the technical challenges involved.
The book is divided into three parts:
- An introduction to the topic of MLOps, how and why it has developed as a discipline, who needs to be involved to execute MLOps successfully, and what components are required.
- The second part follows the machine learning model life cycle, with chapters on developing models, preparing for production, deploying to production, monitoring, and governance.
- Provides tangible examples of how MLOps looks in companies today, so readers can understand the setup and implications in practice.
2. What Is MLOps? from O’Reilly
What Is MLOps? Generating Long-Term Value from Data Science & Machine Learning by Mark Treveil and Lynn Heidmann is a thorough report for business leaders who want to understand and learn about MLOps as a process for generating long-term value while reducing the risk associated with data science, ML, and AI projects.
Here’s what the report includes:
- Detailed components of ML model building, including how business insights can provide value to the technical team
- Monitoring and iteration steps in the AI project lifecycle–and the role business plays in both processes
- How components of a modern AI governance strategy are intertwined with MLOps
- Guidelines for aligning people, defining processes, and assembling the technology necessary to get started with MLOps.
3. Practical MLOps from O’Reilly
The book Practical MLOps: Operationalizing Machine Learning Models by Noah gift and Alfredo Deza is an insightful guide that takes you through what MLOps is, how it differs from DevOps, and shows you how to put it into practice to operationalize your machine learning models.
This is what you’ll learn from the book:
- Apply DevOps best practices to machine learning
- Build production machine learning systems and maintain them
- Monitor, instrument, load-test, and operationalize machine learning systems
- Choose the correct MLOps tools for a given machine learning task
- Run machine learning models on a variety of platforms and devices, including mobile phones and specialized hardware.
👉 Here’s the free version on GitHub. It’s a public repo where code samples are stored for the book Practical MLOps.
4. Reliable Machine Learning from O’Reily
Whether you are part of a small startup or a planet-spanning megacorp, this practical book shows data scientists, SREs, and business owners how to run ML reliably, effectively, and accountably within your organization. You’ll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization.
By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guests show you how to run an efficient ML system. Whether you want to increase revenue, optimize decision-making, solve problems, or understand and influence customer behavior, you’ll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind.
Specifically, you’ll examine:
- What ML is: how it functions and what it relies on?
- Conceptual frameworks for understanding how ML “loops” work.
- Effective “productionization,” and how it can be made easily monitorable, deployable, and operable.
- Why do ML systems make production troubleshooting more difficult, and how to get around them?
- How ML, product, and production teams can communicate effectively?
5. Machine Learning Engineering by Andriy Burkov
“If you intend to use machine learning to solve business problems at scale, I’m delighted you got your hands on this book.” —Cassie Kozyrkov, Chief Decision Scientist at Google
This book has received glaring reviews from several industrial stalwarts for its focus on hugely important topics in practical Machine Learning. It does a great job of:
- Outlining the importance of monitoring
- How to approach model maintenance?
- What to do when things go wrong?
- How to think about fallback strategies for the kinds of mistakes you can’t anticipate?
- How to deal with adversaries who try to exploit your system?
- and how to manage the expectations of your human users?
It offers a comprehensive review of machine learning engineering best practices and design patterns, so if you are someone who is looking to bring more structure to their work and find solutions pertaining to reproducibility, scalability, and model version control, this book is for you.
This course is curated by AI pioneer Andrew NG, one of the most influential computer scientists in the world, who co-founded Google Brain, and Coursera, and led AI research at Baidu. His Machine Learning specialization course is still considered to be the best in business for beginners.
The Machine Learning Engineering for Production (MLOps) Specialization covers how to conceptualize, build, and maintain integrated systems that continuously operate in production. In striking contrast with standard machine learning modeling, production systems need to handle relentless evolving data. Moreover, the production system must run non-stop at the minimum cost while producing maximum performance.
In this specialization:
- You will learn how to use well-established tools and methodologies for doing all of this effectively and efficiently.
- You will become familiar with the capabilities, challenges, and consequences of machine learning engineering in production.
- By the end, you will be ready to employ your new production-ready skills to participate in the development of leading-edge AI technology to solve real-world problems.
If you are looking to build your MLOps expertise around a certain platform such as Google Cloud Platform (GCP), this might be the course for you. This course introduces participants to MLOps tools and best practices for deploying, evaluating, monitoring, and operating production ML systems on Google Cloud.
This course is primarily intended for:
- Data Scientists looking to quickly go from machine learning prototype to production to deliver business impact.
- Software Engineers looking to develop Machine Learning Engineering skills.
- ML Engineers who want to adopt Google Cloud for their ML production projects.
You will learn to:
- Identify and use core technologies required to support effective MLOps.
- Adopt the best CI/CD practices in the context of ML systems.
- Configure and provision Google Cloud architectures for reliable and effective MLOps environments.
- Implement reliable and repeatable training and inference workflows.
👉 Additionally, you can check the official website of MLOps for more interesting information and resources to expand your knowledge and learn about the best practices.
The Stanford MLSys Seminar Series is, as the name suggests, a series of seminars focused on machine learning and ML systems—tools and all the technology used for programming machine learning models.
The course started in fall 2020. Every talk is live-streamed in this seminar series Thursdays 1-2 PT on YouTube. You can ask questions on the live chat. Videos of the talks are available on YouTube afterward as well. Give the channel a follow here, and tune in every week for an exciting discussion!
The goal of the course is to help curate a curriculum of awesome work in ML systems to help drive research focus to interesting questions.
That’s definitely the best MLOps community out there. Almost 10k practitioners in one place, asking questions, sharing knowledge, and just talking to each other about all things MLOps.
While MLOps shares a lot of ground with DevOps, the differences are as big as the similarities. We needed a community laser-focused on solving the unique challenges we deal with every day building production AI/ML pipelines. We’re in this together. Come learn with us in a community open to everyone. Share knowledge. Ask questions. Get answers.—https://mlops.community/
Apart from the very active Slack channel, MLOps Community also runs a podcast (more about it below), organizes meetups and reading groups, and sends newsletters. Make sure to check all these resources.
1. MLOps Live
This is a show/podcast run by Neptune.ai. It’s a biweekly Q&A where practitioners doing ML at a reasonable scale answer questions from other ML practitioners. We started it because after talking to many ML teams, we realized that they all want to learn more about how teams like them solve their problems. And it wasn’t easy to find this kind of resource. So we figured we could try and change that.
But we wanted to do it in a really practical way. Presentations are great, but sometimes you almost want to skip to the Q/A section right away. There’s never enough time for that, right? And it’s usually the most interesting part of the talk.
So we’ve decided to make the whole thing a live Q/A. Every episode is focused on a different subject related to production ML and is full of juicy bits, the things you won’t find in a company blog post. Things that you only get when practitioners are asking practitioners.
You can register for these events here and if you want to catch up with previous episodes, you can do it on:
I recommend you to start with this episode:
2. MLOps.community Podcast
MLOps.community is a podcast hosted by Demetrios Brinkmann. It has “weekly talks and fireside chats about everything that has to do with the new space emerging around DevOps for Machine Learning aka MLOps aka Machine Learning Operations.”
There are interviews, conversations with interesting people, tips, talks about challenges, trends, and more. Tune in and listen!
There are tons of great episodes, so it’s difficult to pick the best ones, but the one episode you definitely should catch up with is this one:
This Week in AI and ML is a leading voice in the field, with over seven million downloads and a large and engaged community following. Through its podcast, it aims to bring the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers, and tech-savvy business and IT leaders.
The TWIML AI Podcast is hosted by Sam Charrington, a sought-after industry analyst, speaker, commentator, and thought leader. Sam’s research is focused on the business and consumer application of machine learning and AI, bringing AI-powered products to market, and AI-enabled and -enabling technology platforms.
You can check all the episodes here, and here is one that we recommend you to listen first:
Other MLOps resources
This is An awesome list of references for MLOps – Machine Learning Operations from ml-ops.org
It’s a list of links to numerous resources, beginning with books, articles, to communities, and many, many more. In a word—it has everything you could possibly read about MLOps. The table of contents includes among others: MLOps Papers, Talks About MLOps, Existing ML Systems, Machine Learning, Software Engineering Product Management for ML/AI, The Economics of ML/AI, Model Governance, Ethics, Responsible AI.
Be careful! It might be a tough and long read if you want to go through all the links, but if you want to learn all about MLOps, it’s one of the best resources. 😉
2. Google Cloud
MLOps: Continuous delivery and automation pipelines in machine learning is a document from Google that “discusses techniques for implementing and automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for machine learning (ML) systems.”
If you’re new to MLOps, this document can be a great source of knowledge as it touches on some basic concepts. But if you’re an MLOps veteran, you’ll also find it helpful to refresh and solidify your knowledge. It can also help reliably build and operate ML systems at scale.
This is a great resource if you are looking to get started with MLOps on the Azure stack. It is somewhat similar to the last resource offered by Google Cloud, however, rather than a course it is more of a Do-It-Yourself. This project aims to –
- Let you know about a number of asset management and orchestration services offered by Azure to help you manage the lifecycle of your model training & deployment workflows.
- Best practices for training and operationalizing your model with Microsoft Azure.
- Offer real-world examples to help you get started with the same.
If you are someone who likes to read about anything directly from the source, here is a very useful link for you. It contains all the scientific and industrial research papers concerning Machine Learning Operations since 2015, discussing everything from technical debt to engineering entire ML systems, the right way.
To wrap it up
MLOps is important if you want to build high-quality models for your ML experiments. It can boost your production pipeline and improve the team’s performance. So keep learning to implement the best practices and always stay on top!
And don’t forget to follow our blog for the latest articles. We’re publishing regularly to bring you the best content. We also have a section dedicated to MLOps, you can find it here.
Are we missing something? Let us know if you’re not seeing your favorite resources on our list!
MLOps: What It Is, Why it Matters, and How To Implement It
13 mins read | Prince Canuma | Posted January 14, 2021
According to techjury, every person created at least 1.7 MB of data per second in 2020. For data scientists like you and me, that is like early Christmas because there are so many theories/ideas to explore, experiment with, and many discoveries to be made and models to be developed.
But if we want to be serious and actually have those models touch real-life business problems and real people, we have to deal with the essentials like:
- acquiring & cleaning large amounts of data;
- setting up tracking and versioning for experiments and model training runs;
- setting up the deployment and monitoring pipelines for the models that do get to production.
And we need to find a way to scale our ML operations to the needs of the business and/or users of our ML models.
There were similar issues in the past when we needed to scale conventional software systems so that more people can use them. DevOps’ solution was a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.
That brings us to MLOps. It was born at the intersection of DevOps, Data Engineering, and Machine Learning, and it’s a similar concept to DevOps, but the execution is different. ML systems are experimental in nature and have more components that are significantly more complex to build and operate.
Let’s dig in!Continue reading ->