Neptune Blog

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

Stephen Oladele

11 min

5th August, 2024

MLOps

In this second installment of the series “Real-world MLOps Examples,” Paweł Pęczek, Machine Learning Engineer at Brainly, will walk you through the end-to-end Machine Learning Operations (MLOps) process in the Visual Search team at Brainly. And because it takes more than technologies and processes to succeed with MLOps, he will also share details on:

1 Brainly’s ML use cases,
2 MLOps culture,
3 Team structure,
4 And technologies Brainly uses to deliver AI services to its clients,

Enjoy the article!

Disclaimer: This article focuses on the setup of mostly production ML teams at Brainly.

Check the previous article from this series

Real-World MLOps Examples: Model Development in Hypefactors

Company profile

Brainly is the leading learning platform worldwide, with the most extensive Knowledge Base for all school subjects and grades. Hundreds of millions of students, parents, and teachers use Brainly every month because it is a proven way to help them understand and learn faster. Their Learners come from more than 35 countries.

The motivation behind MLOps at Brainly

To understand Brainly’s journey toward MLOps, you need to know the motivation for Brainly to adopt AI and machine learning technologies. At the time of this writing, Brainly has hundreds of millions of monthly users across the globe. With that scale of active monthly users and the number of use cases they represent, ML applications can benefit users greatly from Brainly’s educational resources and improve their learning skills and paths.

Brainly’s core product is Community Q&A Platform where users can ask any question from any school subject by:

Typing it out
Taking a photo of the question
Saying it out loud

Once a user enters their input, the product provides the answer with step-by-step explanations. If the answer is not in the Knowledge Base already, Brainly sends it to one of the Community Members to respond.

“We build AI-based services at Brainly to boost the educational features and take them to the next level—this is our main reasoning behind taking advantage of the tremendous growth of AI-related research.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The AI and technology teams at Brainly use machine learning to provide Learners with personalized, real-time learning help and access to the world’s best educational products. The objectives of the AI/ML teams at Brainly are to:

Move from a reactive to a predictive intervention system that personalizes their users’ experience
Solve future educational struggles for users ahead of time
Make students more successful in their educational paths

You can find more on Brainly’s ML story in this article.

Machine learning use cases at Brainly

The AI department at Brainly aims to build a predictive intervention system for its users. Such a system leads them to work on several use cases around the domains of:

Content: Extracting content attributes (e.g., quality attributes) and metadata enrichment (e.g., curriculum resource matching)
Users: Enhancing the learning profile of the users
Visual Search: Parsing images and converting camera photos into answerable queries
Curriculum: Analyzing user sessions and learning patterns to build recommender systems

It would be challenging to elaborate on the MLOps practices for each team working on these domains, so in this article, you will learn how the Visual Search AI team does real-world MLOps.

Watch this video to learn how the Content AI team does MLOps.

“If you think about how users of Brainly’s services formulate their search queries, you may find that they tend to lean towards methods of input that are easy to use. This includes not only visual search but also voice and text search with special kinds of signals that can be explored with AI.“
— Paweł Pęczek, Machine Learning Engineer at Brainly

MLOps team structure

The technology teams at Brainly are divided into product and infrastructure teams. The infrastructure team focuses on technology and delivers tools that other teams will adapt and use to work on their main deliverables.

On top of the teams, they also have departments. The DevOps and Automation Ops departments are under the infrastructure team. The AI/ML teams are in the services department under infrastructure teams but related to AI, and a few AI teams are working on ML-based solutions that clients can consume.

On the foundation of the AI department is the ML infrastructure team, which standardizes and provides solutions for the AI teams that can be adapted. The ML infrastructure team makes it easy for the AI teams to create training pipelines with internal tools that make their workflow easier by providing templated solutions in the form of infrastructure-as-a-code for each team to autonomously deploy in their own environments.

Multiple AI teams also contribute to ML infrastructure initiatives. This is similar to an internal open-source system where everyone works on the tools they maintain.

“This setup of teams, where we have a product team, an infrastructure team that divides into various departments, and internal teams working on specific pieces of technology to be exposed to the product, is pretty standard for big tech companies.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Bookmark for later

How to Build ML Model Training Pipeline

The MLOps culture at Brainly

Two main philosophies behind the MLOps culture at Brainly are:

1 Prioritizing velocity
2 Cultivating collaboration, communication, and trust

brainly_mlops culture — *MLOps culture at Brainly*

Prioritizing velocity

“The ultimate goal for us is to enable all of the essential infrastructure-related components for the teams, which should be reusable. Our ultimate goal is to provide a way for teams to explore and experiment, and as soon as they find something exciting, push that into clients’ use cases as soon as possible.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The goal for the MLOps ecosystem is to move as quickly as possible and, over time, learn to build automated components faster. Brainly has common initiatives under the umbrella of its infrastructure team in AI departments. Those initiatives enable teams to grow faster by focusing on their main deliverables.

“Generally, we try to be as fast as possible, exposing the model to real-world traffic. Without that, the feedback loop would be too long and bad for our workflow. Even from the team’s perspective, we usually want this feedback instantly—the sooner, the better. Otherwise, this iterative process of improving models takes too much time.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Effects of prioritizing velocity: How long does it take the team to deploy one model to production?

During the early days, when they had just started the standardization initiative, each team had various internal standards and workflows, which made it take months to deploy one model to production. With workflows standardized across teams and data in the right shape, most teams are usually ready to deploy their model and embed it as a service in a few weeks—if research goes well, of course.

“The two phases that take the most time at the very beginning are collecting meaningful data and labeling the data. If the research is entirely new and you have no other projects to draw conclusions from or base your understanding on, the feasibility study and research may take a bit longer.

Say the teams have the data and can immediately start the labeling. In that case, everything goes smoothly and efficiently in setting up the experimentation process and building ML pipelines—this happens almost instantly. They can produce a similar-looking code structure for that project. Maintenance is also pretty easy.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Another pain point that teams faced was structuring the endpoint interface so clients could adopt the solution quickly. It takes time to talk about and agree on the best interface, and this is a common pain point in all fields, not just machine learning. They had to cultivate a culture of effective collaboration and communication.

Cultivating collaboration, communication, and trust

After exposing AI-related services, the clients must understand how to use and integrate them properly. This brings interpersonal challenges, and the AI/ML teams are encouraged to build good relationships with clients to help support the models by telling people how to use the solution instead of just exposing the endpoint without documentation or telling them how.

Brainly’s journey toward MLOps

Since the early days of ML at Brainly, infrastructure, and engineering teams have encouraged data scientists and machine learning engineers working on projects to use best practices for structuring their projects and code bases.

With that, they can get started quickly and will not need to pay a large amount of technical debt in the future. These practices have evolved as they have built a more mature MLOps workflow following the “maturity levels” blueprint.

“We have quite an organized transition between various stages of our project development, and we call these stages ‘maturity levels.’”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The other practice they imposed from the onset was to make it easy for AI/ML teams to begin with pure experimentation. At this level, the infrastructure teams tried not to impose too much on the researchers so they could focus on conducting research, developing models, and delivering them.

Setting up experiment tracking early on is a best practice

“We enabled experiment tracking from the beginning of the experimentation process because we believed it was the key factor significantly helping the future reproducibility of research.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The team would set up research templates for data scientists to bootstrap their code bases for special use cases. Most of the time, these templates have all the modules that integrate with their experiment tracking tool, neptune.ai.

They integrate with neptune.ai seamlessly with code, such that everything is nicely structured in terms of the reports that they send to neptune.ai, and teams can review and compare experiments pre- and post-training.

→ Case study on how Brainly added the experiment tracking component to their MLOps stack.

→ Lessons learned by engineers behind neptune.ai when building an experiment tracking tool.

MLOps maturity levels at Brainly

MLOps level 0: Demo app

When the experiments yielded promising results, they would immediately deploy the models to internal clients. This is the phase where they would expose the MVP with automation and structured engineering code put on top of the experiments they run.

“We are using the internal automation tools we already have to make it easy to show our model endpoints. We are doing this so clients can play with the service, exposing the model so they can decide whether it works for them. Internally, we called this service a ‘demo app’.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

During the first iterations of their workflow, the team made an internal demo application that clients could connect to through code or a web UI (user interface) to see what kind of results they could expect from using the model. It was not a full-blown deployment in a production environment.

“Based on the demo app results, our clients and stakeholders decide whether or not to push a specific use case into advanced maturity levels. When the decision comes, the team is supposed to deploy the first mature or broad version of the solution, called ‘release one.’

On top of what we already have, we assembled automated training pipelines to train our model repetitively and execute the tasks seamlessly.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

MLOps level 1: Production deployment with training pipelines

As the workflows for experimentation and deployment got better and became standard for each team, they shifted their focus to ensuring they had a good approach to re-training their model when new data arrived.

The use cases evolved eventually, and as the amount of new data exploded, the team switched to a data-centric AI approach, focusing on collecting datasets and constantly pushing them into pipelines instead of trying to make the models perfect or doing too much research.

Because speed was important in their culture, they were expected to use automated tools to send full deployments to the production environment. With these tools, they could do things like:

Trigger pipelines that embedded models as a service
Verify that the model’s quality did not degrade compared to what they saw during training

“We expose our services to the production environment and enable monitoring to make sure that, over time, we can observe what happens. This is something we call MLOps maturity level one (1).”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The goal of working at this level is to ensure that the model is of the highest quality and to eliminate any problems that could arise early during development. They also need to monitor and see changes in the data distribution (data drift, concept drift, etc.) while the services run.

MLOps level 2: Closing the active learning loop

MLOps level two (2) was the next maturity level they needed to reach. At this level, they would move the model to a more mature level where they could close the active learning loop if it proved to have a good return on investment (ROI) or was needed for other reasons related to their KPIs and the vision of the stakeholders.

They would continually create larger and better data sets by automatically extracting data from the production environment, cleaning it up, and, if necessary, sending it to a labeling service. These datasets would go into the training pipelines they have already set up. They would also implement more extensive monitoring with better reports sent out daily to ensure that everything is in order.

Machine learning workflow of the Visual Search team

Here’s a high-level overview of the typical ML workflow on the team:

First, they would pull raw data from the producers (events, user actions in the app, etc.) into their development environment
Next, they would manipulate the data, for instance, by modulating the filter and preprocessing it into the required formats
Depending on how developed the solution was, they would label the datasets, train the models using the training pipeline, or leave them as research models

brainly machine learning — *Brainly’s Machine Learning Workflow*

“When our model is ready, we usually evaluate it. Once approved, we start an automated deployment pipeline and check again to ensure the model quality is good and to see if the service guarantees the same model quality measured during training. If that’s the case, we simply deploy the service and monitor to see if something is not working as expected. We validate the problem and act upon it to make it better.

We hope to push as many use cases as possible into this final maturity level, where we have closed the active learning cycle and are observing whether or not everything is fine.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Of course, closing the loop for their workflow requires effort and time. Also, some use cases will never reach that maturity level because it is natural that not every idea will be valid and worth pursuing to that level.

MLOps infrastructure and tool stack for Brainly’s Visual Search team

The team’s MLOps infrastructure and tool stack is divided into different components that all contribute to helping them ship new services fast:

1 Data
2 Experimentation and model development
3 Model testing and validation
4 Model deployment
5 Continuous integration and delivery
6 Monitoring

The image below shows an overview of the different components and the tools the team uses:

brainly visual search — *Brainly’s Visual Search team MLOps stack*

Let’s take a deeper look at each component.

Data infrastructure and tool stack for Brainly’s visual search team

“Our data stack varies from one project to another. On the computer vision team, we try to use the most straightforward solutions possible. We simply store the data in S3, and that’s just fine for us, plus permissions prohibiting unauthorized users from mutating data sets as they are created.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The team has automated pipelines to extract raw data and process it in the format they want it to be trained on. They try to be as generic as possible with data processing without sophisticated tools. They built on what the Automation Ops team had already developed to integrate with the AWS tech stack.

The team uses AWS Batch and Step Functions to run batch processing and orchestration. These simple solutions focus more on the functionalities they know best at Brainly than on how the service works.

“Our current approach gets the job done, but I wouldn’t say it’s extremely extensive or sophisticated. I know that other teams use data engineering and ETL processing tools more than we do, and compared to them, we use more straightforward solutions to curate and process our data sets.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Experimentation infrastructure and tool stack for Brainly’s visual search team

“We try to keep things as simple as possible for experimentation. We run training on EC2 instances and AWS SageMaker in their most basic configuration. For the production pipelines, we add more steps, but not too many, so that SageMaker doesn’t get overused.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The goal is to reduce complexity as much as possible for data scientists to run experiments on EC2 machines or SageMaker with extensions, making workflow efficient. On top of the infrastructure, there aren’t many tools except for neptune.ai, which tracks their experiments.

Learn more

Check how exactly neptune.ai supports experiment tracking needs.

The team uses a standard technology stack, like libraries for training models, and simple, well-known ways to process datasets quickly and effectively. They combine the libraries, run them on an EC2 machine or SageMaker, and report the experiment metrics on neptune.ai.

“We focus more on how the scientific process looks than on the extensive tooling. In the future, we may consider improvements to our experimentation process, making it smoother, less bulky, etc. Currently, we’re fine and have built a few solutions to run training jobs on SageMaker or easily run the same code on EC2 machines. ”
— Paweł Pęczek, Machine Learning Engineer at Brainly

They keep their experimentation workflow simple so that their data scientists and researchers don’t have to deal with much engineering work. For them, it works surprisingly well, considering how low the complexity is.

“We also do not want to research our internal model architectures. If there’s a special case, there’s no strict requirement for not doing so. Generally, we use standard architectures from the different areas we work in (speech, text, and vision)—ConvNets and transformer-based architectures.

We are not obsessed with any one type of architecture. We try to experiment and use what works best in specific contexts.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Model development frameworks and libraries

The computer vision team mostly uses PyTorch for model development, but it’s not always set in stone. If the model development library is good and their team can train and deploy models with it, they could use it.

“We don’t enforce experimentation frameworks for teams. If someone wants to use TensorFlow, they can, and if someone wants to leverage PyTorch, it is also possible. Obviously, within a specific team, there are internal agreements; otherwise, it would be a mess to collaborate daily.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Deployment infrastructure and tool stack for the visual search team

The team uses standard deployment tools like Flask and other simple solutions and inference servers like TorchServe.

“We use what the Automation Ops provide for us. We take the model and implement a standard solution for serving on EKS. From our perspective, it was just easier, given our existing automation tools.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

On Amazon EKS, they deploy the services using different strategies. In particular, if tests, readiness, and liveness probes are set up correctly, they can avoid deployment if problems come up. They use simple deployment strategies but are looking at other, more complex strategies in the future as the need arises.

Continuous integration and delivery tool stack for the visual search team

“We leverage CI/CD extensively in our workflows for automation and building pipelines. We have a few areas where we extensively leverage the AWS CI/CD Pipeline toolstack.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The team uses solutions the Automation Ops team has already provided for CI/CD. They can add CI and CD to the experiment code with a few lines of Terraform code. When it comes to pipelines for training, they use the Terraform module to create CI/CD that will initialize the pipelines, test them, and deploy them to SageMaker (Pipelines) if the tests pass.

They have production and training code bases in GitHub repositories. Each time they modify the code, the definition of the pipeline changes. It rebuilds the Docker image underneath and runs the steps in the pipeline in the defined order. Everything is refreshed, and anyone can run training against a new dataset.

Once the model is approved, the signals from the model registry get intercepted by the CI/CD pipeline, and the model deployment process starts. An integration test runs the holdout data set through the prediction service to see if the metrics match the ones measured during the evaluation stage.

If the test passes, they’ll know nothing is broken by incorrect input standardization or similar bugs. If everything is fine, they’ll push the service into production.

“We don’t usually try to use extensive third-party solutions if AWS provides something reasonable, especially with the presence of our Automation Ops team that provides the modules we can use.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Model testing and approval of the CI/CD pipeline

“We test our models after training and verify the metrics, and when it comes to pure engineering, we make sure that everything works end-to-end. We take the test sets or hold-out datasets, push them to the service, and check if the results are the same as previously.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The AI/ML team is responsible for maintaining a healthy set of tests, ensuring that the solution will work as it should. Regarding other teams, they may approach testing ML models differently, especially in tabular ML use cases, by testing on sub-populations of the data.

“It’s a healthy situation when data scientists and ML engineers, in particular, are responsible for delivering tests for the functionalities of their projects. They would not need to rely on anything or anyone else, and there would be no finger-pointing or disagreements. They just need to do the job properly and show others that it works as it should.

For us, it would be difficult to achieve complete test standardization across all of the pipelines, but similar pipelines have similar test cases.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

The tooling for testing their code is also simple—they use PyTest for unit and integration tests and more sophisticated tests.

“The model approval method depends on the use case. I believe some use cases are so mature that teams can just agree to get automatic approval, which would be after reaching a certain performance threshold.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Most of the time, the user (the machine learning engineer or data scientist) has to keep an eye on the model verification process. To make the process more consistent, they made a maintenance cookbook with clear instructions on what needed to be checked and done to make sure the model met specific quality standards.

It wouldn’t be enough just to verify the metrics; other qualitative features of the model would also have to be checked. If that is completed and the model is relatively okay, they will push the approval button, and from that moment on, the automated CI/CD pipeline will be triggered.

Managing models and pipelines in production

Model management is quite context-dependent for different AI/ML teams. For example, when the computer vision team works with image data that requires labeling, managing the model in production will be different from working with tabular data that is processed in another way.

“We try to keep an eye out for any changes in how our services work, how well our models predict, or how the statistics of the data logged in production change. If we detect degradation, we’ll look into the data a little more, and if we find something wrong, we’ll collect and label new datasets.

In the future, we would like to push more of our use cases to MLOps maturity level two (2), where more things related to data and monitoring will be done automatically.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

Clients also measure their KPIs, and the team can be notified if something goes wrong.

Model monitoring and governance tools

To get the service performance metrics, the team uses Grafana to observe the model’s statistics and standard logging and monitoring solutions on Amazon Elastic Kubernetes Service (Amazon EKS). They use Prometheus to add statistics about how the services work and make them available as time series. This makes adding new dashboards, monitoring them, and getting alerts easy.

The Automation Ops team provides bundles for monitoring services, which justifies the team’s decision to make their stack as simple as possible to fit into their existing engineering ecosystem.

“It’s reasonable not to overinvest in different tools if you already have good ones.”
— Paweł Pęczek, Machine Learning Engineer at Brainly

In the case of model governance, the team is mainly concerned with GDPR and making sure their data is censored to some degree. For example, they wouldn’t want personal information to get out to labelers or bad content to get out to users. They’d filter and moderate the content as part of their use case.

That’s it! If you want to learn more about Brainly’s technology ecosystem, check out their technology blog.

Thanks to Paweł Pęczek and the team at Brainly for working with us to create this article!

Was the article useful?

More about Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

Check out our product resources and related articles below:

Real-World MLOps Examples: Model Development at Hypefactors

Learnings From Building the ML Platform at Stitch Fix

Learnings From Building the ML Platform at Mailchimp

How to Build an Experiment Tracking Tool [Learnings From Engineers Behind Neptune]

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs