Only 10% of AI/ML projects have created positive financial impact according to a recent survey of 3,000 executives.
Given these odds, it seems that building a profit generating ML project requires a lot of work across the entire organization, from planning to production.
In this article, I’ll share best practices for businesses to ensure that their investments in Machine Learning and Artificial Intelligence are actually profitable, and create significant value for the entire organization.
Best practices for identifying AI use cases
Most AI projects fail at the very first hurdle – poor understanding of the business problems that can be solved with AI. This is the main bottleneck in successful deployment of AI.
This problem is compounded by the early stages of organizational intuition for AI, and for how it can be leveraged to solve critical business problems .
What does this mean? Well, not every problem can be feasibly solved with AI. To understand if your particular problem can, you need tried and tested practices and approaches.
If you’re looking for best practices on building AI teams that deliver successful outcomes, see my previous article  How to Build Machine Learning Teams That Deliver.
AI use cases
AI has transformed industries. It automates routine and manual processes, and provides crucial predictive insights to almost all business functions. Table 1 shows a list of some of the business use cases that have been successfully addressed using AI.
|Voice recording||Text transcript||Speech recognition||Customer service|
|Transactions||Fraud transactions||Fraud detection||Banking|
|Clinical symptoms||Diagnosis||Diagnosis prediction||Healthcare|
|Customer reviews||Customer sentiment||Sentiment analysis||Service industry|
|News articles||Summary of news||Text summarization||Press & Media|
Table 1. Business use cases solved by AI
Brainstorming appropriate business problems should ideally be done together with business leaders, product managers, and any available subject matter experts. The list of business problems sourced across the organization should then be vetted, and analyzed for potential solutions using AI.
Not every business problem should be solved with AI. Oftentimes, a rule-based or engineered solution is good enough. Additionally, a lot of business problems can be mined from customer reviews or feedback, which typically points to broken business processes that need to be fixed.
In table 2, you can see a checklist of questions, both technical and commercial, to determine whether a business problem is relevant for AI.
|Kind of data||Structured (tabular) or unstructured (images, video, text, audio, multimodal)?|
|Size of data||Size of each data sample and the entire corpus available for training|
|Domain||Finance, Ecommerce, Customer service, Logistics etc.|
|Frequency of data||How often and how much data is generated for the use case?|
|Annotations||Availability of labels for the data? Cost of labeling? Availability of internal or external domain experts to label the dataset?|
|Data flow||What is the flow of data from the end user to the internal ecosystem till the model prediction is served to the users?|
|Privacy||Are there any data privacy, regulatory or compliance constraints for using the data to build and deploy AI models?|
|Market validation||Is this an established use case in similar companies or in the wider related market?|
|Research validation||Is there significant academic research on this topic to understand what kind of AI models and approaches might be relevant?|
|Choice of models||What kind of models are relevant for the use case? Classical ML, ensemble or deep learning models?|
|Model deployment||Are the models to be deployed in the cloud or on mobile/edge devices?|
|Inference type||Does the use case require real-time or batch/offline predictions?|
|Model performance||What are the metrics to measure the performance of the model?|
|Production metrics||What are the constraints on accuracy, latency and throughput for the model in production?|
|User research||What is the user persona? How to best surface the AI model predictions? UX design and mock-ups to ensure ease of use|
|Product / feature||Is this an existing product with data available or a novel product / feature with limited data to train AI models|
|Business metrics||Are there well-defined business metrics or need to brainstorm afresh?|
|Timeline||Is this use case a short-term or a long-term investment?|
|Roadmap||Is there a clear roadmap for end-to-end delivery of the use case?|
|Budget||How much budget can be allocated to build the use case end-to-end?|
|Team||Which functional teams are required to collaborate for the use case?|
|Bandwidth||How much bandwidth do stakeholder teams have?|
Table 2. A checklist of data, model and business questions to validate a business problem with a potential AI solution.
KPIs and Metrics
As part of the planning process, the appropriate model and business metric for each potential use case should be discussed. Work backwards from the expected outcome, and it’ll be easier to crystallize which particular metric to optimize.
To illustrate this, in table 3 I prepared a list of AI use cases and corresponding model and business metrics. For the success of an AI project, it’s ultimately important to ensure the business metric and goals are achieved.
|INPUT||OUTPUT||TECHNICAL METRIC||BUSINESS METRIC|
|Voice recording||Text transcript||Word Error Rate||CSAT, AHT, NPS|
|Faces||Identity||Recognition Rate||Domain-dependent e.g. ID of criminals or stalkers etc.|
|Transactions||Fraud transactions||F1, Precision, Recall||Revenue loss, Fraud to Sales Ratio etc.|
|Purchase history||Recommendations||Mean Average Precision at K||Uplift in Average Revenue Per User, or Number of items added to cart etc.|
|Clinical symptoms||Diagnosis||F1, Precision, Recall||Savings in doctors’ time and number of appointments etc.|
|Customer reviews||Customer sentiment||F1, Precision, Recall||NPS|
|News articles||Summary of news||Rouge Score||CTR, Views|
Table 3. Example AI use cases and their technical and business metrics.
We have a set of business problems. They’ve been reviewed and documented after careful consideration of the criteria listed in Table 2, and analysis of appropriate business metrics as in Table 3. The candidate list of use cases needs to be prioritized, or ranked, in terms of impact and relevance to the overarching business strategy and goals.
From a detailed written document describing comprehensive facets of the business use case and potential AI-based solutions, it’s useful to have objective criteria to quantify all the proposed use cases on the same scale. Here, it’s crucial for product managers and business leaders to have their own intuition about how AI works in practice, or rely on the judgment of a product-focused technical or domain expert. Whilst it’s easy to rank projects on certain success criteria, it’s not so straightforward to rate the risk associated with AI projects.
A balanced metric ought to consider and weigh the likelihood and impact of a successful outcome of the AI projects versus the risk of it failing or not generating enough impact. Risks to the project might be related to organizational aspects, domain-specific aspects of the AI problem, or related to external factors beyond the remit of the business. Once a suitable balanced metric is defined, it aligns all stakeholders and leadership, who are then able to form their own subjective views based on the objective scores.
A lot of factors need to be considered before a ‘yes’ or ‘no’ decision is made for a particular AI project, as well as the number of AI-relevant projects selected for a defined period. Securing buy-in from the leadership is difficult. Certain final executive decisions might appear subjective or not data-driven, but it’s still absolutely critical to go through the aforementioned planning process to present each AI project in the best light possible, and maximize the likelihood of the AI project being selected for execution.
Best practices for planning AI use cases
As part of the planning process with cross-functional teams, it’s important for organizations to have a streamlined mechanism for defining the AI product vision or roadmap, the bandwidth, specific roles and responsibilities of individual contributors and managers in each team, as well as the technical aspects (data pipelines, modeling stack, infrastructure for production and maintenance).
In this section, I’ll describe the details of specific planning steps essential to build a successful AI product.
AI product requirements
For each identified use case, it’s necessary to draw the roadmap for how the product will evolve from its baseline version to a more mature product over time. In Table 4, I outline a set of essential questions and criteria to fulfil for creating a comprehensive AI roadmap for each use case.
|PR-FAQ / User stories||Is there a FAQ document, written from the customer’s perspective, that explains how the AI product solves specific customer problems?|
|PRD||Is there a comprehensive Product Requirements Document that describes in detail all the necessary organizational ingredients to build and ship the AI product?|
|Customer surveys||Was the AI use case inspired by customer problems? If proposed by Product teams, was it validated by customer surveys?|
|Milestones||Does the PRD document achievable milestones during the product development at specific timelines to assess and evaluate progress?|
|Risk factors||Are all potential risk factors that can derail the product or delay the launch or development of the AI product addressed?|
|Business metrics||As per Table 3, are the business metrics and KPIs well defined?|
|Technical metrics||As per Table 3, are the AI model metrics well defined?|
|Release criteria||For each phased launch of the AI product, are the acceptance criteria clearly established?|
Table 4. A list of factors to address for the AI product roadmap.
PR-FAQ (Press Release – Frequently Asked Questions) and PRD (Product Requirements Document) are two critical documents that are generally prepared during the initial stages of product ideation and conception. Pioneered by Amazon, these two documents serve as the north star for all concerned teams to align themselves with and build and scale the product accordingly. It’s absolutely essential that all stakeholder teams contribute meaningfully to these documents and share their specific domain expertise to craft a meticulous document for executive review.
It’s necessary for all stakeholder team managers to review and contribute to the document, so that any team- or domain-specific intrinsic biases of product development are laid bare and addressed accordingly. Typically, teams should rely on data-driven intuition for product development. In the absence of in-house data, intuition for the AI product can be borrowed from work done by other companies or research in the same field [2, 4].
As the roadmap is defined and finalized after stakeholder meetings, it’s always beneficial to have an MVP or a basic prototype of the AI product ready to validate initial assumptions and present to the leadership. This exercise also helps to streamline the data and engineering pipelines necessary to acquire, clean and process the data and train the model to obtain the MVP.
The MVP should not be a highly sophisticated model. It should be basic enough to successfully transform the input data to a model prediction, and trained on a minimal set of training data. If the MVP is hosted as an API, each of the cross-functional stakeholder teams can explore the product and build intuition for how the AI product might be better developed for the end customer.
From a data perspective, the machine learning team can dive deeper into the minimal training data, and do a careful analysis of the data as listed in Table 5.
|Features||Are the features categorical, numerical, text?|
|Distribution of data||What is the distribution of each feature? How many target classes for classification problems? Is the data for each class balanced or not?|
|Outliers, missing and null values||How robust is the data? How many missing and null values? How will you account for these and remove any outliers?|
|Feature selection / engineering||Are all features intuitively important for the task? Can you do feature selection to reduce redundant or highly correlated features or transform features to a lower-dimensional space or engineer new or compound features?|
|Data labels||Do labels exist for each training sample? Manually review the quality of a random sample of labels across the various classes.|
|Data augmentation||Is the available data sufficient to train an MVP? Is there a need to augment new data samples or create synthetic data?|
|Data splits||Are the datasets for training, validation, and test sets well curated and balanced to assess generalizability?|
|Data versioning||Is there a mechanism to version each version of raw, cleaned, transformed, training, validation and test dataset?|
|Data format||What is the optimal data format for training the models and serving them in production? Are there any better alternatives?|
|Data storage and access||How will data across the entire data processing and modeling lifecycle be stored and accessed for current and future experiments?|
|Data pipeline||Is a structured and well-documented data transformation pipeline established keeping in mind all the above data quality checks?|
Table 5. A list of data quality and feasibility checks for the AI MVP.
After systematic assessment of the data quality, features, statistics, labels and other checks as listed in Table 5, the Machine Learning team can start building the prototype / MVP model. The best approach at the early stages of product development is to act with speed rather than accuracy. The initial (baseline) model should be simple enough to demonstrate that the model works, the data and modeling pipelines are bug-free, and the model metrics indicate that the model performs significantly better than chance.
Machine learning use cases and products have become increasingly complex over the years. Whilst linear regression and binary or multi-class classification models were once too common, there are newer classes of models that are faster to train, and generalize better on real-world test data. For the ML scientist or engineer, no two use cases may be built using an identical tech stack of tools and libraries. Depending on the characteristics of the data relevant for the AI use case (see Table 2), the data science team must define the modeling stack specific to each use case (see Table 6 below).
|AI MODEL CHECKS|
|ML problem statement||Does the use case require an ML solution based on regression or classification or optimization or recommendation etc.?|
|Classical ML models||Can the use case be solved effectively by traditional ML models? If yes, what’s a suitable ML baseline model?|
|Deep learning models||Is there enough (unstructured) data to leverage more sophisticated neural networks and deep learning models?|
|Ensemble models||Does a combination of ML/DL models with each one addressing or specific aspects of the use case or specific segments of the data?|
|Hyperparameter optimization||For each class of ML/DL models, what are the key hyperparameters to optimize? What is the best search approach – grid search, random search or Bayesian optimization?|
|Model versioning and formats||How will each different model be versioned and stored? What is the best model format to deploy the model?|
|Experimental metadata||For each model training and hyperparameter optimization experiment, how will the experimental metadata and results be stored, accessed and shared across the team?|
|Error analysis||Is there any framework in place to analyze and categorize model errors, types of errors (false positives or false negatives) and how to account for these with newer or better models?|
|Modeling pipeline||Is there an end-to-end ML/DL modeling pipeline from the training data to model prediction, results and visualizations?|
|Acceptance criteria||What are the model acceptance criteria in terms of accuracy, latency and throughput for each phase of the launch?|
|Deployment pipeline||Is there a clear pipeline to take the model to production? Will the model be deployed via Docker or cloud or notebook servers? What kind of instances are best suited for deployment?|
|Model monitoring||Are model monitoring tools including dashboards, logs and metrics setup?|
Table 6. A list of AI model and feasibility checks for the AI MVP.
Best practices for executing AI use cases
After identifying and planning for promising AI use cases, the next step is to actually execute the projects. It might seem that execution is a straightforward process, where the machine learning team gets to weave their magic. But, simply ‘building models’ is not enough for successful deployment. Model building has to be done in a collaborative and iterative fashion:
- involving feedback from users of the product as well as cross-functional teams,
- incorporating any new or revised feature requests from product teams,
- updating initial hypotheses for the use case based on any changes in the business or operating environment,
- only then launching the product to users.
Shipping the model to production is a major milestone to celebrate, document and share within the organization – but the work doesn’t stop there. It’s crucial to monitor how the model performs on real world data from customers, and periodically apply fixes or update the model so that it doesn’t get stale along with changes in:
- distribution of data,
- nature of use cases,
- customer behaviour,
In the next section, I will discuss the best practices for the operational aspects of executing and deploying AI models successfully and realizing the proposed commercial value.
Reviews and feedback
Once the AI project has kickstarted, it’s essential for the machine learning team to have both periodic as well as ad-hoc review meetings with stakeholders, including product teams and business leadership. The documents prepared during the planning phase (PR-FAQ and PRD) serve as the context in which any updates or changes should be addressed.
The goal of regular meetings is to assess the state of progress vis-a-vis the product roadmap, and address any changes in:
- product or business strategy,
- organizational structure,
- resources allocated to the project.
While planning is important, most corporate projects don’t go as initially planned. It’s important to be nimble and agile, respond to any new information (regarding technical, product or business aspects), and re-align towards a common path forward. For example, the 2020 lockdowns severely impacted the economy. In light of such high-impact unexpected events, it’s critical to adapt and change strategy for AI use cases as well.
In addition to regular internal feedback, it’s good to keep in touch with the end users of the product throughout the AI lifecycle. In the initial stages (user research, definition of target user personas and their demographics), and especially in product design and interaction with the model predictions. A core group of users from the target segment should be maintained to obtain regular feedback across all stages of product development.
Once an MVP is ready, users can be very helpful in providing early feedback that can often bring to light several insights and uncover any biases or shortcomings. When the AI model is ready to be shipped and different model versions are to be evaluated, user feedback can again be very insightful. User insights about the design, ease of use, perceived speed and overall user flow can help the product team to refine the product strategy as needed.
From the technical perspective, the model building process is usually an iterative one. After establishing a robust baseline, the team gets insight into how far the model performance is from the established acceptance criteria. In the early stages of model building, the focus should primarily be on accuracy rather than latency.
At each stage of model development, a comprehensive analysis of model errors on the validation set can reveal important insights into the model shortcomings, and how to address them. The errors should also be reviewed in conjunction with subject matter experts, to evaluate any errors in data annotation as well as any specific patterns in the errors.
If the model is prone to a particular kind of error, it might need additional features. Or it might need to be changed to a model based on a different objective function, or underlying principle, to overcome these errors. This repetitive process helps the machine learning team to consolidate their intuition about the use case, think outside the box, and propose new creative ideas or algorithms to achieve the desired metrics.
During the course of model building, machine learning practitioners should systematically document every experiment and the corresponding results. A structured approach is helpful not only for the particular use case, but also helps build organizational knowledge that can be helpful to onboard new hires, or serve as shining examples of successful AI deployment.
Deployment and maintenance
Once the candidate machine learning model is ready and benchmarked thoroughly on the validation and test sets, errors analyzed, and the acceptance criteria met, the model may be taken to production. There’s a huge difference between the model training and deployment environments. The format in which the model is trained may not be compatible with taking the model to production, and need to be appropriately serialized and converted to the right format.
In an environment that simulates the production settings, model accuracy and latency should be validated again on the hold-out dataset. Deployment should be done incrementally by surfacing the model to a small portion of real-world traffic or input to the model, ideally to be tested first by internal or core user groups.
Once the deployment pipeline has been rigorously tested and vetted by the MLOps team, more traffic can be directed to the model. In scenarios where one or more candidate models are available, A/B testing of these models should be done systematically, and evaluated for statistically significant differences to determine the winning model.
Post-deployment, it’s important to ensure that all the input-output pairs are collected and archived appropriately within the data ecosystem. The launched model should be periodically assessed and the distribution of the real-world data compared with the distribution of the training data to assess for data and model drifts. In such cases, an active learning pipeline that feeds some of the real-world test samples back into the original training dataset helps to alleviate the shortcomings of the deployed model.
Finally, once the model production environment and all pipelines are stable, the machine learning and product teams should evaluate the business metrics and KPIs to assess whether the metrics meet the predefined success criteria or not. In case it does, then only can the use case be deemed to be a success and a summary of the overall use case and results should be documented and shared internally with every stakeholder and the business leadership.
If machine learning, product and business teams in startups and enterprises adopt a systematic approach and follow the best practices as laid out in this article, then the likelihood of successful AI outcomes can only increase.
Adequate upfront preparation is crucial. Without it, teams won’t be able to rectify any errors or respond to changes, nor realize the massive commercial potential that AI can deliver.
-  https://www.bcg.com/en-in/publications/2020/is-your-company-embracing-full-potential-of-artificial-intelligence
-  https://www.sundeepteki.org/blog/why-corporate-ai-projects-fail-part-14
-  https://neptune.ai/blog/how-to-build-machine-learning-teams-that-deliver
-  https://www.sundeepteki.org/blog/why-corporate-ai-projects-fail-part-24
MLOps at GreenSteam: Shipping Machine Learning [Case Study]
7 mins read | Tymoteusz Wołodźko | Posted March 31, 2021
GreenSteam is a company that provides software solutions for the marine industry that help reduce fuel usage. Excess fuel usage is both costly and bad for the environment, and vessel operators are obliged to get more green by the International Marine Organization and reduce the CO2 emissions by 50 percent by 2050.
Even though we are not a big company (50 people including business, devs, domain experts, researchers, and data scientists), we have already built several machine learning products over the last 13 years that help some major shipping companies make informed performance optimization decisions.
In this blog post, I want to share our journey to building the MLOps stack. Specifically, how we:
- dealt with code dependencies
- approached testing ML models
- built automated training and evaluation pipelines
- deployed and served our models
- managed to keep human-in-the-loop in MLOps