We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more

Blog » MLOps » How to Make a Machine Learning Project More Likely to Succeed?

How to Make a Machine Learning Project More Likely to Succeed?

The speed of innovation is extreme. It seems like every day you hear about artificial intelligence disrupting another industry or profession. Let’s talk about what happens when project management meets Artificial Learning.

Unlike standard products and services, productized solutions (like AI software served as a product) are never completed or fully turned over to the customer. They’re ongoing projects that keep improving to the evolving needs of users. 

Productized solutions rely less on traditional project-based delivery methods, and more on product development-based ones. Flexibility and adaptability are critical for meeting customer needs. AI can streamline the entire process and make the most current information readily accessible, helping with rapid decision-making and expedited delivery. 

AI enables project managers to transition to facilitative leaders by automating routine tasks like performance reporting. AI can detect and uncover patterns from data that you wouldn’t see, and then combine that information with your data to make better decisions about the new product you want to create.

Artificial intelligence can streamline many aspects of stakeholder engagement, like ongoing email communication and feedback. It can also monitor interactions to facilitate better communications and knowledge transfer. AI provides increased productivity and decentralizes operations. 

Process flows can be monitored with AI to ensure protocols are followed, to check resource availability in assigning tasks, provide real-time reporting, and curate various programs. For organizations to remain agile and efficient, they’ll need to offload routine tasks to AI, embrace new management practices, and give more creative freedom to employees. 

AI can help you predict deadlines that might be missed, team member productivity, emergent risks, and quality levels. These predictions guide better decision making by:

  • scheduling planning,
  • delivering time probability,
  • budget monitoring,
  • staff delegation and task tracking,
  • risk assessment.

Businesses face aggressive competition, so they strive for the highest possible operational effectiveness and efficiency. AI operations and projects have different deliverables, but a similar structure. 

Predictability creates scalability, and scalability leads to growth. If projects can be broken down into predictable processes, they can be automated and scaled into predictable streams of revenue. So, organizations can leverage AI to produce predictable results and grow bigger and faster while avoiding bottlenecks in their projects.

Operationalization AI

A white paper published by the International Institute for Analytics in 2019 estimated that less than 10% of AI pilot projects have reached full-scale production at the time.  The production phase can be the most complex, but it’s usually underestimated in AI projects. Operationalizing AI simply means managing the complete end-to-end lifecycle of an AI project. 

In this approach, various experiments are applied to real business problems and replicated at scale and speed. The project development workflow for AI is different from the traditional application development workflow. 

In addition to a build-test-deploy-manage order of process, it has two distinct phases of operation:

  • Training,
  • Inference.

The training phase is where AI algorithms learn patterns and trends from the data and build a model. The inference phase uses the model constructed to make predictions/generalizations on new, unseen data (real-world data). In essence, the training phase happens in the organizational “experiments,” and the inference phase happens in the “real world.”

Adopting AI is a priority for some industries, but it has numerous potential applications and might end up being used across the board. There’s a need to recognize a systematic approach to operationalizing AI to build AI products in a repeatable, timely manner to shorten the process, lower overhead, and reduce risks. 

To create successful AI products with real-scale implementation and real-world application, a proper understanding of the organization structure, resources, customers’ needs, budget, and internal workflows is of prime importance. A standardized approach to operationalizing AI that consists of Scope, Understand, Build, Deploy, and Manage and Trust propagates a collaborative process.

Stages of the AI life cycle

Businesses need a systematic approach to manage the end-to-end lifecycle of AI products. As mentioned earlier, this life cycle consists of 5 stages which are generally defined as follows:

  • Scope: To prevent misunderstanding and build the right solutions for the respective target problems, the stakeholders and AI expertise team need to reduce ambiguity. This is achieved by scoping use case requests and prioritizing the AI product functionalities regarding the pain points. It’s essential to have coherent communication and transparency between the business stakeholders and technical team, and detailed definitions of successful business and technological outcomes. Agreement on key performance indicators (KPIs) and quality criteria of the solution’s expectation is key to project success and performance review. The scope demands the need to understand the business problem, and requires the stakeholders and technical team to grasp the business opportunity and potential impact. This phase is a collaborative process to understand the pain points of the problem and build empathy to fully understand the likely user needs. With a shared language between stakeholders and technical team, prioritization of pain points and options to develop an action plan on specific AI tasks to solve and indicate the environment of the implementation. The scoping phase guides the definition of project goals and vision aligned to the team’s business objective. Implicit risks and assumptions need to be identified and documented in this phase. This phase helps in the development of a risk assessment strategy and defines the minimal viable product. 
  • Understand: Artificial Intelligence is not a magical solution, so it won’t be helpful if built without intention. The development phase consists of ‘training’ and ‘interference’, and data is crucial. There’s no AI project without data, and the adage “Garbage In, Garbage Out” is at the center of building a successful AI project. This phase explores and understands data sources, metadata, lineage, relationship to other datasets, data quality measures, and establishes appropriate rules and policies for governance in the scope of the business opportunity. This stage is simply about developing the data-acquisition strategy and management needed for the respective AI initiative.
  • Build: AI isn’t monolithic, as the immense value it creates comes from various AI capabilities. This phase is a series of iterations focused on specific functionality essential for building the AI model: data exploration, data labeling, preparation, feature engineering, model building, testing, and predicting behavior or discovering insights. This collaborative process relishes various tools, techniques, and frameworks that can either be open-sourced or proprietary. The build phase is best undertaken as a set of Agile sprints. Each sprint has a defined ROI objective and outcomes. The number and duration of the set of sprints are to be determined by the project manager. Each sprint should produce a prototype of the final deliverable.
  • Deploy: The deployment phase represents full-scale AI capabilities where real-life data is used to solve real-life problems. The step is simply developing models used in real life to push the Proof-of-concept (POC) into production. It possesses a higher level of complexity that depends on company structure, proposed customers, company size, internal workflows, ethics, infrastructure demands, data management, knowledge, budgets, and real-life conditions. This stage defines the data deployment strategies to specific systems as AI models can be deployed on numerous systems and interfaces. The platforms on which the AI models are to be used are unique to deployment specifics and depend on the use case. The structured approach to moving from “build” to “deploy” is referred to as MLOps (Machine Learning Operations) as it sets up a pipeline to carry assets from development to production environments. Transferring assets from development to production takes sequences of steps that steer into the continuous integration/continuous deployment (CI/CD) paradigm, supported by a CI/CD pipeline. The diagram below shows a robust implementation of the AI pipeline using CI/CD, which has the automated ML pipeline setup characteristics plus the automated CI/CD routines.

This CI/CD provides a structure and optimized workflow for the entire AI lifecycle. As a result, the end goal can easily be benchmarked for success and enable iterative improvement.

  • Manage and Trust: AI projects need support from efficient management of resources and performance optimization. Like Jason Tamara said, “throwing more resources at a problem does not get to the right answer; it rather gets you to the wrong answers more quickly.” Deployed model performance degrades over time as new patterns, and new customer intents are subjected to real-life environments. This degradation is due to the training phase being out of scope on these new changes in real life. This phase monitors the model performance metrics and evaluates these metrics against the defined threshold. The evaluation process can trigger automation requests to either re-train the model or alert the development team. Aligning the various performance metrics and business KPIs improves shared language amongst stakeholders and increases trust in the initiative. Having full adoption of AI solutions is pertinent to increase trust and transparency. It’s essential to evaluate models on Explainability, fairness, and bias. Responsible AI is a fast-growing field in the AI ecosystem that researches and proposed frameworks that helps trust AI deployments. Fairness monitoring and bias mitigations are key requirements as specified by various guiding policies. The Explainability of model results fosters trust as it understands how decisions were made in contrast to the black-box paradigm. Explainability improves AI process transparency, quantification of model features enables accessibility to root causes analysis.

Specific Machine Learning challenges: project initiation

Understanding the Machine Learning team

The machine learning team might be different from regular software development team members. As organizations work to set themselves up for success in building artificial intelligence, the question of who to involve in the process is one they need to consider and understand. The definitive set of groups that should be established is leadership, analysis and engineering, data and governance, and design and visualizations. Read more here.

Business acceptance and adoption

In most cases, the business customer will come to your team asking for help meeting their business goals and excited about introducing machine learning in their field. But few understand the ramifications of automating current business processes and using complex algorithms to make decisions that humans are currently making. It’s like the business customer used to drive their car to work every day. And now, the machine learning team tells them they have to use a self-driving car and accept that it will work better. Deploying these predictive models to actual production inside your company will need frank discussions with the business customer upfront making sure they understand how to roll out the model so that it is accepted, adopted, and put to good use by their teams.

Lack of project charter

It’s helpful to have a project charter for machine learning projects, even if the methodology is agile. It will help focus the business question and give a vision for the work. The project charter should have at least project scope, data sources, business metrics, and a list of stakeholders. Many companies have put together data science or machine learning teams because they know it’s a competitive advantage. Still, they might have a hard time getting actual business value out. Creating the project charter will bring out business metrics and calculate the model lift (quantity by which the model improves the current situation). This project charter defines the vision for the project and keeps the machine learning team focused on the business goal.

Specific machine learning challenges: project planning

Lack of a comprehensive project plan

A recommended best practice for machine learning projects is to use some form of agile methodology. A complete project plan will help the team focus, and offer tangible deliverables and timelines to stakeholders. A recommended template for this comprehensive plan is the CRoss Industry Standard Process for Data Mining (CRISP-DM) process model. It’s a six-phase guardrail to aid the planning, organization, and implementation of AI projects. It’s a standard developed for data mining that translates closely to the steps needed to execute an AI/ML project.

Crisp-DM project planning

The components of the CRISP-DM methodology can serve as main anchors and functional features in your project. Let’s have a brief walkthrough:

  1. Business understanding

This step focuses on understanding the project’s high-level requirements, objectives, and defining business metrics. This step also assesses resource availability, risk, and contingencies and enables conducting a cost-benefit analysis. Also, selecting technologies and tools to be utilized can be covered here, and signing off of the project charter can be deliverables in this phase. While many teams hurry through this phase, establishing a solid business understanding is essential to building the foundation of your project.

  1. Data Understanding

Data is a principal part of any machine learning project. Without data to learn from, models can’t exist. Unfortunately, in many companies getting access to data and using data can be highly time-consuming due to regulations and procedures. This step focuses on identifying, exploring, collecting, and analyzing data set to help accomplish the project objective. This step comprises defining data sources, getting access to data, creating data storage environments, and preliminary data analysis.

  1. Data preparation: 

Even after getting the data you need, chances are it requires some cleaning or transformation as it travels around the enterprise. This step prepares the data set for modeling; it includes sub-tasks of selecting data, cleaning data, formating data, integrating data, and constructing data. This phase consists of creating data pipelines for ETL(Extract, transform, load). Data will be changed several times. Understanding the processes involved in suggested subtasks data preparation is essential for efficient model building. Read more here.

  1. Modeling: 

After data preparation, it’s time to build and assess various models based on several different modeling techniques. This step is made up of selecting modeling techniques, feature engineering, generating test design, building and evaluating models. The CRISP-DM guide suggests to “iterate model building and assessment until you strongly believe that you have found the best model(s).” 

  1. Evaluation: 

The assessment and evaluation previously covered were focused on the technical model assessment. The evaluation phase is broader as it assesses which model best meets the business objectives and defined baseline. Sub-tasks in this phase are evaluating results (model lift, performance metrics), reviewing processes, and determining the next steps.

  1. Deployment: 

“Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise.” CRISP-DM Guide. The complexity of this step depends on the deployment strategy. This step comprises planning deployment, monitoring, and maintenance, producing final reports, and reviewing. Suggested sub-tasks include creating a calling application, deployment to quality assurance(QA), automating a data pipeline, integration testing, and deployment to production.

  1. Setting up the agile framework: 

Effective communication is a key component of successful project management and delivery. Project status reporting is a regular, formalized report on project progress against the project plan.  As the project manager, status reporting will be part of your job. Adequate planning and structure are essential regarding tools like JIRA, or TFX.

First, define the roles of each stakeholder in your project. Make it clear who will play the part of the project owner, the scrum master, the release train engineer,  who is the tech lead for your team. 

Next, think of the big picture. Is this model project a stand-alone disconnected effort? Probably not in a well-organized team. Engineers and data scientists create several smaller models and then use them together to solve different business problems. Chances are, the current model is part of a larger vision. 

Talk with the tech lead and get an understanding of how the model will be developed and used. If the model is part of a current effort, you might want to add this project to the recent agile release train (ART) if it makes sense and is agreed upon by the team. Another thing to consider is setting up your agile board to easily roll up into a program or portfolio view together with other machine learning projects. Creating a portfolio-level dashboard is very useful in communicating with the executive managers and gives you a quick view of the entire team’s health. 

Define your initiatives or features that are the major business milestones you want to achieve. You can start the feature list with the subtasks we defined above in your project plan. You should validate the features with the product owner and the tech lead and then prioritize them with the same people and keep a consistent structure. 

Setting up a consistent structure in Jira or other tools you might want to use makes it easier to create automated metrics and reports. Also, train the team on the methodology, processes, and how to use the necessary tools. 

Prepare for your first product increment (PI) or release planning. Once you have a plan and some features defined, set up a backlog grooming session with your team to start creating some stories under each feature. When your backlog is of a good size, set up your first PI planning. My suggestion is to have a clear agenda that you communicate ahead of time and vet it with the product owner and the tech lead beforehand. The planning aims to fill up the next sprints with stories and define enough of the features. It will mark the beginning of your project’s implementation and the start of the actual development work.

Specific machine learning challenges: project execution

Team’s agile maturity

The development team might be inexperienced with agile methodology, so introducing agile concepts should be done gradually, and training should be provided as needed. In many cases, you can set up training sessions that are fun and relevant for the team. Once the work starts, the team will become familiar with the process, and it will take two to three sprints before you can start introducing tighter agile controls and generate metrics that make sense.

Data exploration

There’s no machine learning without lots of data, and in many companies getting the data in one place will require opening access tickets, following strict regulations or policies, even taking courses in data security, before the technical team is allowed to work with the data. That’s why the duration of the data exploration phase of the project is hard to predict. 

Allow lots of time and be as clear as possible on each data source needed from the beginning, so you can prepare the best you can for this phase. You might spend one to two sprints just on getting data access, and another two to three sprints on the data exploration itself. 

Create your stories and agile to account for this type of work; otherwise, you’ll end up with a burned-down chart that looks like a flat line. Consider having technical spikes for the data access and data exploration, or even switch to Kanban for this phase of the project if your team can handle it. As you create stories for this project phase, try and split up the user stories in vertical slices by data source, by environment, even by team members; small and concrete stories smart in agile will better show the progress of the work.

Modeling iterations

This is another challenge for the project manager because there will be unknowns and uncertainties. The technical team can’t estimate how long the modeling effort will take until they try out different algorithms and adjust the features several times. Allow for iterations in your project plan. As you set up your sprints and do the planning, create user stories that are small enough to be completed in one sprint but build up eventually to the bigger goal.

Feature validation

As the engineers define the features in the data, they will most likely make use, sure you validate that with the business customers. The development team only has a technical view of things. Still, you want to make sure your business customers have a say in what features they think would be valuable.

Model interpretability

Algorithms in machine learning can be a black box even for developers. The model can’t be tested in the usual way because there are no rules we can follow. Instead, the machine determines the result based on complex calculations that aren’t reproducible. This will make many business owners unhappy, and your legal and regulatory departments will want to ensure that the model isn’t biased in any way. There are technical ways, such as using specific code libraries, that provide enough explainability for the model. 

Visibility into work status

One of the significant complaints of business owners on a machine learning project is the lack of visibility into the work. This is due to the very technical nature of this work. One way to provide visibility is to use agile metrics to show how many features are in progress, how many are delayed, and if we achieved what we committed to every iteration or PI period. 

Business deliverables

Make sure you have clear checkpoints with your stakeholders throughout the agile process. Include them in all critical meetings and, more importantly, make sure the development team has a business deliverable at the end of each major phase of the project, where they do a sprint demo and present their findings to the business customer. It’s a great way for you as the project manager to keep a close handle on how the project evolves, especially if findings during the modeling phase will prompt a change in the project scope.

Specific machine learning challenges: project monitoring

Setting up Agile metric

A best practice is always to record agile metrics. These can start from the basic velocity calculations, the burned-down chart, and the cumulative flow chart. These basic metrics will be enough at first when the teams are just forming up and learning how to plan and estimate their work. However, as the project progresses and the teams are accustomed to the agile process, you can create more complex metrics to help you improve the team performance and output. Jira and other tools have already incorporated several reports; if none is available to you, export the raw data, epic’s stories, and tasks into Excel and create your reports and graphs. I suggest creating a dashboard for executive management to see a high-level picture of the entire portfolio’s health and then dive into each art or project to see the details if needed.

Setting up business and analytic metrics

As I mentioned during the project initiating and planning, Ensure your business customer can define their business metrics and success criteria for your project. What will make them say this project brought us a lot of value. Is it an increase in sales, an increase in profits, or even a decrease in customer defections? Be prepared to measure it as the project develops, and even after the model goes into production, there are also more technical metrics from models called Analytical metrics.

Specific machine learning challenges: project closing

Post-deployment challenges

This requires a clear definition of ownership as the project is closing and the model goes live. Who owns the model now? Is it still your team, or is it the business customer? It’s often unreasonable to expect the business customer to know how to own and maintain the model. However, they should have the expertise to own the calling application or wrapper. 

So, when the project closes, a maintenance phase starts. During this phase, you as the project manager should provide business and analytical metrics to your business customer to ensure the model performs as expected. It’s ideal for creating a change control process because the model might show signs of deterioration over time. 

Maybe things change on the website, and now the model features don’t work as well. Perhaps something changed in the data recently. Whatever the reason, someone (like a change control board) should review the model metrics and performance regularly and decide if the model should be retrained or redeployed.

Conclusion

We’ve explored certain aspects of AI in project management, and best practices for managing specific ML project challenges. I hope you learned something new from this article. Thanks for reading!


READ NEXT

MLOps: What It Is, Why it Matters, and How To Implement It

13 mins read | Prince Canuma | Posted January 14, 2021

According to techjury, every person created at least 1.7 MB of data per second in 2020. For data scientists like you and me, that is like early Christmas because there are so many theories/ideas to explore, experiment with, and many discoveries to be made and models to be developed. 

But if we want to be serious and actually have those models touch real-life business problems and real people, we have to deal with the essentials like:

  • acquiring & cleaning large amounts of data;
  • setting up tracking and versioning for experiments and model training runs;
  • setting up the deployment and monitoring pipelines for the models that do get to production. 

And we need to find a way to scale our ML operations to the needs of the business and/or users of our ML models.

There were similar issues in the past when we needed to scale conventional software systems so that more people can use them. DevOps’ solution was a set of practices for developing, testing, deploying, and operating large-scale software systems. With DevOps, development cycles became shorter, deployment velocity increased, and system releases became auditable and dependable.

That brings us to MLOps. It was born at the intersection of DevOpsData Engineering, and Machine Learning, and it’s a similar concept to DevOpsbut the execution is different. ML systems are experimental in nature and have more components that are significantly more complex to build and operate.

Let’s dig in!

Continue reading ->
Best MLOps tools

The Best MLOps Tools and How to Evaluate Them

Read more
GreenSteam MLOps toolstack

MLOps at GreenSteam: Shipping Machine Learning [Case Study]

Read more
MLOps 10 Best Practices You Should Know

MLOps: 10 Best Practices You Should Know

Read more
MLOps too stack companies

How These 8 Companies Implement MLOps: In-Depth Guide

Read more