Getting machine learning to solve some of the hardest problems in an organization is great. And eCommerce companies have a ton of use cases where ML can help. The problem is, with more ML models and systems in production, you need to set up more infrastructure to reliably manage everything. And because of that, many companies decide to centralize this effort in an internal ML platform.
But how to build it?
In this article, I will share my learnings of how successful ML platforms work in an eCommerce and what are the best practices a Team needs to follow during the course of building it.
But first, let’s discuss core retail/eCommerce Machine Learning use cases that your ML platform can and should support.
What are the model types that an eCommerce ML platform can support?
While there are things that all internal ML platforms have in common, there are particular model types that make more sense for an eCommerce, just like:
- 1 Product search
- 2 Personalization and recommendation
- 3 Price optimization
- 4 Demand forecasting
Product search is the foundation for any eCommerce business. Customers share their intent through the search platform. If the Product Search platform is not optimal, a lot of customer demand may remain unfulfilled.
The ML platform can utilize historic customer engagement data, also called “clickstream data”, and transform it into features essential for the success of the search platform. From an algorithmic perspective, Learning To Rank (LeToR) and Elastic Search are some of the most popular algorithms used to build a Seach system.
Personalization and recommendation
Product Recommendation in eCommerce is the gateway to providing relevant and valuable suggestions to fulfill customers’ needs. An eCommerce Product Recommendation system, if implemented right, offers a better customer experience, drives more customer engagement, and results in better revenue.
We can collect and use user-product historical interaction data to train recommendation system algorithms. Traditional Collaborative Filtering or Neural Collaborative Filter algorithms that rely on users’ past engagement with products are widely used to solve such Personalisation and Recommendation problems.
Price Optimisation is a core business problem of retail. eCommerce companies have to find a trade-off between “maintaining an unsold item in the warehouse” vs. “promoting the sale of the item by offering an attractive discount”?
Due to this, developers might have to optimize the pricing strategy very often. To support such incremental development of the model, there is a need to build an ML platform with CI/CD/CT support to move the needle faster.
Estimation of future demand helps an eCommerce company to better manage procurement and replenishment decisions. There are a few products that are seasonal, and their demand fluctuates around the year. Summer clothes, winter clothes, holiday decorations, Halloween Costumes, moisturizers, etc., are some examples.
An ML model employing popular forecasting algorithms like SARIMAX, AIRMA, etc. can take all of these factors into account to figure out a better estimate of the demand and help make better eCommerce decisions about their catalog and inventory.
How to set up an ML Platform in eCommerce?
The objective of an ML Platform is to automate repetitive tasks and streamline the processes starting from data preparation to model deployment and monitoring. An ML Platform helps in the faster iteration of an ML project lifecycle.
The following schematic diagram depicts the major components of an ML platform.
One might give a different name to a component, but the major components in an ML Platform are as follows:
- 1 Data platform
- 2 Data processing
- 3 Continuous integration / continuous deployment / continuous training
- 4 Model serving
- 5 Performance monitoring
These are the components we will find in any ML Platform, but what’s special about ML Platform in Retail? It’s about how we design each of these components. In the following sections, we will discuss how each of these components is formulated to support Retail use cases.
You may also like
Consideration for data platform
Setting up the Data Platform in the right way is key to the success of an ML Platform. When you look at the end-to-end journey of an eCommerce platform, you will find there are plenty of components where data is generated. As you find in the following diagram, to deliver an item from a Supplier to a consumer, an item travels through several layers in the supply chain network.
Each of these layers generates a high volume of data, and it’s essential to capture these data as it plays a crucial role in optimization. Sometimes it becomes challenging to manage such a volume of data coming from multiple sources.
Sources of data
- Clickstream Data: Customers’ journey begins with searching for an item by writing a query. As Customers continue to interact with the eCommerce portal, a stream of click data is generated. Customers’ interaction is captured so that the search and recommendation system is improved by analyzing customers’ past behavior.
- Product Catalogue: Product Catalogue data is the single source of truth for any algorithm to know about a product. An eCommerce company procures products from multiple vendors, manufacturers, and suppliers. Consolidating the data coming from multiple channels and persisting those to maintain an enriched product catalog is challenging.
- Supply Chain Management Data: Another source of data is the Supply Chain Management System. As an item travels through the supply chain network, it generates data at every layer, and getting this data to persist is important to optimize the supply chain network.
The objective of the data platform is to persist the data in a way that it’s easy to process the data for ML model development. In the following sections, we will discuss best practices while setting up a Data Platform for Retail.
Maintaining the history of data
While building a Data Platform for eCommerce, preserving customers’ past engagement data is crucial as recommendation systems utilize historic customer engagement data to build better algorithms. Maintaining a long history of session-level data could be cumbersome. Let’s understand this with an example.
The Clickstream Data usually contains <SessionId, User, Query, Item, Click, ATC, Order>. Maintaining session-level data for each user over a long history could be overkill, and ML model development might not always require that level of granular data.
So, a better database architecture would be to maintain multiple tables where one of the tables maintains the past 3 months history with session-level details, whereas other tables may contain weekly aggregated click, ATC, and order data.
Versioning of dataset
During the development of an algorithm, a Data Scientist might have to run multiple experiments. Keeping track of which data was used to run an experiment sometimes becomes painful for a Data Scientist. So, versioning of data helps to better track changes to the data over time.
As an example, in eCommerce, the Data for Product Catalogues changes over time. Sometimes new products are added to the catalogue whereas inactive products are also removed. So, while building a model, it’s important to keep track of which version of catalogue data is used to build the model because the inclusion or deletion of products might lead to inconsistent predictions.
Selection of the right data storage platform
In eCommerce, a Data Scientist deals with all kinds of data. Selection of a storage platform based on the type of data and the type of application is essential.
- The Data Platform needs to have integration with BigQuery, Cloud file Storage platforms (like Amazon S3, GCP bucket etc.) via Data Connectors.
- There can be multiple sources of data at the same time, which can be available in different forms like image, text, and tabular form. One might want to utilize an off-the-shelf ML Ops Platform to maintain different versions of data.
- To store Image data, Cloud storage like Amazon S3 and GCP buckets, Azure Blob Storage are some of the best options, whereas one might want to utilize Hadoop + Hive or BigQuery to store clickstream and other forms of text and tabular data.
How to set up a data processing platform?
We all know how Data preprocessing plays a crucial role in an ML project life cycle, Developers spend more than 70% time preparing the data in the right format. In this section, I will talk about best practices around building the Data Processing platform.
The objective of this platform is to preprocess, prepare and transform the data so that it’s ready for model training. This is the ETL (Extract, Transform, and Load) layer that combines data from multiple sources, cleans noise from the data, organizes raw data, and prepares for model training.
As discussed earlier, eCommerce deals with data of different natures, and data could be flowing from multiple data sources. So, before combining data flowing from multiple sources, we need to verify the quality of the data.
As an example for catalogue data, it’s important to check if the set of mandatory fields like product title, primary image, nutritional values, etc. are present in the data. So, we need to build a verification layer that runs based on a set of rules to verify and validate data before preparing it for model training.
Exploratory data analysis
The purpose of having an EDA layer is to find out any obvious error or outlier in the data. In this layer, we need to set up a set of visualisations to monitor statistical parameters from the data.
This is the final layer in the Data Processing unit that transforms the data into features and stores them in a feature store. A feature store is a repository that stores features that can be directly used for model training.
Say, a model uses the number of times a user has ordered an item as one of the features. The clickstream data that we get in its raw format has session-level data of users’ interaction with products. We need to aggregate this click stream data at the user and item level to create the feature and store that feature in the centralized feature store.
Building this kind of Feature Store has a number of advantages:
- 1 It enables easy reuse of features across multiple projects.
- 2 It also helps to standardize feature definitions across teams.
Consideration for CI/CD/CT platform
Setting up a platform for continuous development
It’s a platform where developers run experiments and find the most optimal model architecture. It’s the test bed for experiments where a developer runs multiple experiments and tries different model architectures, try to find out appropriate loss functions, and experiments with hyperparameters of models.
JupyterLabs has been one of the most popular interactive tools for ML development with Python. So, this platform can leverage the JupyterLab environment to write code and execute. This platform needs access to the Data Platform and needs to have support for all types of Data Connectors to fetch data from data sources.
Setting up a platform for continuous training
An eCommerce ML Platform has a need for a variety of models – Forecasting, Recommendation System, Learning To Rank, Classification, Regression, Operation Research, etc. To support the development of such a diverse set of models, we need to run several training experiments to figure out the best model and keep retraining the obtained model every time we get new data. Thus the ML Platform should have support for CT (Continuous Training) along with CI/CD.
Continuous Training is achieved by setting up a pipeline that pulls data from the feature store, trains the model using the model architecture pre-estimated by the continuous development platform, calculates evaluation metrics, and registers the model to the model registry if the evaluation metrics progress in the right direction. Once the new model is registered in the model registry, a new version is created, and the same version is used to pull the model during deployment.
But what is Model Registry, and what are these evaluation metrics?
- A model registry is a centralized platform that stores and manages trained ML models. It stores the model weights and maintains a history of model versions. A model registry is a very useful tool for organizing different model versions.
- In addition to the model weights, a model registry also stores metadata about the data and models.
- A model registry should have support for a wide variety of model types like TensorFlow-based models, sklearn-based models, transformer-based models, etc.
- Tools like neptune.ai have fantastic support for a model registry to streamline this process.
- Every time a model is registered, a unique Id is generated for that model, and the same is used to track that model for deployment.
With neptune.ai you can save your production-ready models to a centralized registry. This will enable you to version, review, and access your models and associated metadata in a single place.
Selecting the best evaluation metrics
Evaluation Metrics help us to decide the performance of a version of the algorithm. In eCommerce, for Recommendation Systems or any other algorithm that directly affects customer experience, there exist two methods to evaluate those models, “Offline evaluation” and “Online evaluation”.
In the case of “Offline evaluation”, the model’s performance is evaluated based on a set of pre-defined metrics that are computed on a pre-defined dataset. This method is faster and easy to use, but these results are not always correlated to actual user behaviour as these methods fail to capture user bias.
Different users who are living in different geo-location introduce their selection bias and cultural bias into the eCommerce platform. Unless we capture such bias through direct interaction of users with the platform, it’s difficult to evaluate a new version of the model.
So, we use methods like A/B Test and/or Interleaving to evaluate an algorithm by deploying that solution to the platform and then capture how users are interacting with the old and the new system.
In eCommerce, A/B Testing is performed to compare two versions of recommendation systems or algorithms by considering the earlier algorithm as a control and the new version of the algorithm as an experiment.
Users with similar demographic, interests, dietary needs, and choices are split into two groups to reduce selection bias. One group of users interacts with the old system, whereas another group of users interacts with the new system.
A set of conversion metrics, like the number of orders, Gross Merchandise Value (GMV), ATC/order, etc. are captured and compared by formulating a hypothesis test to conclude with statistical significance.
One might have to run an AB Test experiment for 3-4 weeks to achieve conclusive evidence with statistical significance. The time depends on the number of users participating in the experiments.
Interleaving is an alternative to A/B Testing where a similar objective is achieved but in lesser time. In Interleaving, instead of dividing users into 2 groups, a combined list of ranks is created by alternatively mixing results from 2 versions of the recommendation algorithm.
To evaluate a recommendation system algorithm, we need both online and offline evaluation methods. Where Offline evaluation using metrics like NDCG (Normalised Discounted Cumulative Gain), Kendall’s Tau, Precision, and Recall helps a developer to fine-tune and test an algorithm in a very quick time frame, online evaluation provides a more realistic evaluation but takes a longer time.
Once Offline and/or Online evaluations are done, the evaluation metrics are stored in a table, and the performance of the model is compared to decide if the new model is outperforming other models. If so, the model is registered to a model registry.
Model serving framework
Once an ML model is developed, the next challenge is to serve the model in the production system. Serving a Machine Learning model is sometimes challenging due to operational constraints.
Primarily, there are two types of model serving:
- Realtime deployment: In these kinds of systems, the model is deployed in an online system where model output is obtained within a tiny fraction of time. This set of models is very sensitive to latency and requires optimisation to meet latency requirements. Most real-world business-critical systems require real-time processing.
- Batch deployment: In these kinds of systems, the model output is inferred on a batch of samples. Typically a job is scheduled to execute model output. There is relatively less focus on latency issues in this kind of deployment.
We need to achieve low latency for real-time or mini-batch mode. The process of serving and optimisation is subject to the choice of framework and the type of model. In the following sections, we will discuss some of the popular tools that help to achieve low latency to serve ML models in the production system.
Open neural network exchange (ONNX)
Optimisation of the inference time of a Machine Learning model is difficult because one needs to optimise the model parameters and architecture and also needs to tune those for the hardware configuration. Depending on whether to run the model on GPU/CPU or Cloud/Edge, this problem becomes challenging. It’s intractable to optimise and tune the model for different kinds of hardware platforms and software environments. This is where ONNX comes to the rescue.
ONNX is an open standard for representing Machine Learning models. A Model built in TensorFlow, Keras, PyTorch, scikit-learn, etc., can be converted to a standard ONNX format so that the ONNX model runs on a variety of platforms and devices. ONNX has support for both Deep Neural Networks and Classical Machine Learning models. So, having ONNX as part of the ML platform saves a lot of time to quickly iterate.
Triton inference server
Computer Vision models and Language Models can have a lot of parameters and thus require a lot of time during inference. Sometimes, it requires performing a set of optimisation to improve the inference time of the model. Triton Inference Server, developed by NVIDIA AI Platform, offers to deploy, run, and scale a trained ML model on any type of infrastructure.
It has support for TensorFlow, NVIDIA® TensorRT™, PyTorch, MXNet, Python, ONNX, XGBoost, scikit-learn, RandomForest, OpenVINO, etc. Triton Inference Server also has support for the Large Language Model, where it partitions a large model into multiple files and executes on multiple GPUs instead of a single one.
Bookmark for later
The performance of an ML model can deteriorate over time due to factors like Concept drift, Data Drift, and Covariate Shift. Consider the example of a Product Recommendation system in eCommerce.
Do you think a model that was trained using data from the pre-pandemic period would work equally well post-pandemic? Due to these kinds of unforeseen circumstances, user behavior has changed a lot.
- 1 Many users are now focusing on purchasing daily essentials rather than expensive gadgets.
- 2 Along with that, as a lot of products could be out of stock due to supply-chain issues.
- 3 Not to mention that in eCommerce, the shopping pattern of a user changes with the user’s age.
So, recommendation systems for your eCommerce might become irrelevant after a while due to such changes.
Some people believe that Model monitoring is not necessarily needed as periodic re-training of the model anyway takes care of any form of drift. This is true, but this idea is useful only if the model is not too large. Gradually we are moving towards larger models. Re-training of such models is expensive and might involve huge costs. So, establishing a model monitoring system helps you navigate through such difficulties.
Best practices for building an MLOps platform for retail
An ML Team in Retail solves a variety of problems, from Forecasting to Recommendation Systems. Setting up the MLOps platform the right way is essential for the success of the Team. Following is a non-exhaustive list of practices one needs to adhere to build an efficient MLOps system for eCommerce.
Versioning of models
While developing an ML model in eCommerce, a Team has to run many experiments. In the process, the team creates multiple models. It gets difficult to manage so many versions of models.
The best practice is to maintain a model registry where a model is registered along with its performance metrics and model-specific metadata. So, each time a new model is created, a version id is attached to the model and stored in the model registry.
During deployment, a model is pulled from the model registry and deployed to the target device. By maintaining a Model registry, one may have the choice to fall back on earlier models based on a need.
Maintaining a feature store
Data Scientists spend a lot of time converting raw data into features. I would say approximately ~70% of a Data Scientist’s effort goes into preparing the dataset. So, automating the pipeline of pre-processing and post-processing the data to create features reduces redundant efforts.
A feature store is a centralized platform to store, manage and distribute features. This centralized repository helps to access features across multiple teams, enables cross-collaboration, and helps to faster model development.
Tracking performance metrics
Many ML models in eCommerce mature over time. Through an iterative process, gradually the performance of a model improves as we get better data and find better architecture. One of the best practices is to keep an eye on the progress of evaluation metrics. So, it’s a good practice to build dashboards with evaluation metrics of algorithms and monitor if the team is making progress in the right direction.
Building a CI/CD pipeline
CI/CD is an absolute must for any MLOps system. It enables faster and more efficient delivery of code changes to production. The CI/CD pipeline streamlines the process from code commit to build generation. It runs a set of automated tests each time a code is committed and provides feedback to the developer about the changes. It gives confidence to developers to write quality code.
Monitoring data drift and concept drift
Setting up an alert to identify significant changes in the data distribution (to capture Data Drift) or significant changes in the model’s performance (to capture Concept Drift) is often not taken care of but is essential.
Robust A/B test platform
AB Test is the method to evaluate algorithms based on customer engagement. But often takes a long time to converge. So, a team should spend time figuring out faster evaluation methods like interleaving to build robust methods for testing algorithms.
This article covered the major components of an ML platform and how to build them for an eCommerce business. We also discussed the need for such an ML platform, and summarized best practices to follow while building it.
Due to frequent breakthroughs in ML space, in future, some of these components and practices might require a change. It is important to stay abreast of the latest developments to make sure you get it right. This article was an attempt in a similar direction and I hope after reading it you will find getting an ML platform ready for your retail business a bit easier.