Machine learning chatbots, summarizing apps, Siri, Alexa – these are just a few cool Natural Language Processing (NLP) projects which are already adopted at mass scale. Have you ever wondered how they’re managed, continuously improved, and maintained? This is exactly the question that we’re going to answer in this article.
For example, Google’s autocorrect gets better every time, but not because they came up with a super good model that doesn’t need any maintenance. It gets better every time because there’s a pipeline, put in place early on for automating and improving the model by performing all ML tasks over and over again when it gets new data. It’s an example of MLOps at its finest.
In this article, I’ll tell you about various MLOps tools you can use for NLP projects. This includes cool open-source MLOps platforms, along with some code to help you get started. I’ll also do a comparison of all the tools, to help you navigate and choose the best tool for any framework you want to use.
Here’s what we’re going to talk about:
Read also
Here are the assumptions I made when writing the article, just so we’re on the same page:
- You understand what NLP is. You don’t need to know much, just a bit of the basic and some process is good enough.
- You’re familiar with the process involved in building machine learning projects. Again, you don’t need to know too much. You should have built at least a machine learning project before, just so you know the terms I’ll be using.
- You’re open-minded and ready to learn!
If you’re an MLOps expert, you can skip the introduction and go straight to the tools.
What is MLOps?
Data changes over time, which makes machine learning models stale. ML models learn patterns in data, but these patterns change as the trends and behaviors change.
We can’t prevent data from always changing, but we can keep our model updated with the new trends and changes. To do this, we need an automated pipeline. This automated process is known as MLOps.
MLOps is a set of practices for collaboration and communication between data scientists and operations professionals.
Please note that MLOps is not fully automated, at least not yet. You still have to do some things manually, but it’s incomparably easier compared to having no workflow at all.
How does MLOps work?
MLOps, or Machine Learning Operations, is different from DevOps.
DevOps is a popular practice in developing and operating large-scale software systems. It has two concepts in software system development:
A typical DevOps cycle is:
- Code,
- Test,
- Deploy,
- Monitor.
In ML projects, there are a lot of other processes like data collection and processing, feature engineering, training, and evaluating ML models, and DevOps can’t handle all of this.

In MLOps, you have:
- data coming into the system which is usually the entry,
- codes to preprocess the data and select useful features,
- codes to train the model and evaluate it,
- codes to test and validate it,
- codes to deploy,
- and so on.
To deploy your model to production, you need to push it through a CI/CD pipeline.
Once it’s in production:
- You need to always check performance and make sure it’s reliable,
- You need an automated alert or triggering system to inform you of issues and to make sure the changes fix the issues raised.

Why do we need MLOps?
It doesn’t matter what kind of solutions you’re trying to deploy, MLOps is fundamental to the success of your project.
MLOps does not only help to collaborate and integrate ML into technologies, it helps data scientists do what they do best, develop models. MLOps automates retraining, testing, and deployment which were manually done by data scientists.
Machine learning helps deploy solutions that unlock previously untapped sources of revenue, save time, and reduce cost by creating more efficient workflows, leveraging data analytics for decision-making, and improving customer experience. These goals are hard to accomplish without a solid framework like MLOps to follow.
How to choose a good MLOps tool
Choosing a suitable MLOps tool for your NLP project depends on the tool of your solution.
Your choice depends on your project needs, maturity, and scale of deployment. Your project must be properly structured (Cookie Cutter is a good project structuring tool that will help you do that).
Manasi Vartak, founder and CEO of Verta, pointed out some questions you should ask yourself before selecting any MLOps tool:
- It should be data scientist-friendly, not restricting your data science teams to work on specific tools and frameworks.
- It should be easy to install, easy to set up, and easy to customize.
- It should integrate freely with your existing platform.
- It should be able to reproduce results; reproducibility is critical whether you are collaborating with team members, debugging a production failure, or iterating an existing model.
- It should scale well; choose a platform that meets your current needs and can scale for the future for both real-time and batch workloads, serving high-throughput scenarios, scaling automatically with the increasing traffic, with easy cost management and safe deployment and release practices.
Best open-source MLOps tools for your NLP projects
Every MLOps tool has its own tool. The open-source platforms listed below are specific to NLP projects and are rated by the number of Github stars they have. Some of the commercialized platforms are specifically for NLP projects, but others can generally be used for any ML project.
AdaptNLP (329 Github stars)
It’s a high-level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models for end-to-end tasks. It was built on top of Zalando Research’s Flair and Hugging Face’s Transformers library.
AdaptNLP provides Machine Learning researchers and scientists a modular and adaptive approach to a variety of NLP tasks with an easy API for training, inference, and deploying NLP-based microservices. You can deploy your Adapt-NLP models using Fast-api, locally or using docker.
AdaptNLP features:
- The API is unified for NLP tasks with SOTA pretrained models. You can use it with Flair and Transformer models.
- Provides an interface for training and fine-tuning your models.
- Easily and instantly deploy your NLP model with FastAPI framework.
- You can easily build and run AdaptNLP containers on GPUs using Docker.
Installation Requirement for Linux/Mac:
I’ll advise that you install it in a new virtual environment to prevent dependency clustering issues. If you have Python version 3.7 installed, you’ll need to install the latest stable version of Pytorch(v.1.7) and if you have Python version 3.6, you’ll have to downgrade your Pytorch to a version <=1.6.
Installation Requirement for Windows:
If you don’t have Pytorch already installed, you’ll have to install it manually from Pytorch.
Using pip,
pip install adaptnlp
or if you want to contribute to the development,
pip install adaptnlp[dev]
- Embeddings
- Question Answering
- Sequence Classification
- Summarization
- Text Generation
- Token Tagging
- Translation
AutoGulon (3.5k Github stars)
AutoGluon is simply AutoML for text, image, and tabular data. It enables you to easily extend AutoML to areas like deep learning, stack ensembling, and other real-world applications. It automates machine learning tasks and gives your model strong predictive performance in your applications.
In just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on text, image, and tabular data. Currently, it provides support for only Linux and MacOS users.
AutoGulon features:
- Create a quick prototype of your deep learning and ML solutions with just a few lines of code.
- Use state-of-the-art techniques automatically without having expert knowledge.
- You can perform data preprocessing, tool search, model selection/ensembling, and hyperparameter tuning automatically.
- AutoGulon is totally customizable for your use case.
Installation:
It requires you to have Python 3.6, 3.7, or 3.8. Currently, it supports only Linux and MacOS. Depending on your system, you can either download the CPU version or GPU version.
Using pip:
For MacOS:
python3 -m pip install -U pip
python3 -m pip install -U setuptools wheel
python3 -m pip install -U "mxnet<2.0.0"
python3 -m pip install autogluon
- Pip install for GPU
Currently unavailable
For Linux:
- Pip install for CPU
python3 -m pip install -U pip
python3 -m pip install -U setuptools wheel
python3 -m pip install -U "mxnet<2.0.0"
python3 -m pip install autogluon
- Pip install for GPU
python3 -m pip install -U pip
python3 -m pip install -U setuptools wheel
# Here we assume CUDA 10.1 is installed. You should change the number
# according to your own CUDA version (e.g. mxnet_cu100 for CUDA 10.0).
python3 -m pip install -U "mxnet_cu101<2.0.0"
python3 -m pip install autogluon
Tutorial:
GluonNLP (2.3k github stars )
It’s a framework that supports NLP processes such as loading text data, preprocessing text data, and training NLP models. It’s available on Linux and MACOS. You can also convert your other forms of NLP models into GulonNLP. A few examples of such models you can convert include BERT, ALBERT, ELECTRA, MobileBERT, RoBERTa, XLM-R, BART, GPT-2, and T5.
GulonNLP features:
- Easy to use Text Processing Tools and Modular APIs
- Pretrained Model Zoo
- Write Models with Numpy like APIs
- Fast Inference via Apache TVM (incubating) (Experimental)
- AWS Integration with SageMaker
Before you start the installation, make sure you have the MXNet 2 release on your system. Just in case you don’t, you can install it from your terminal. Choose one out the following options:
# Install the version with CUDA 10.2
python3 -m pip install -U --pre "mxnet-cu102>=2.0.0a"
# Install the version with CUDA 11
python3 -m pip install -U --pre "mxnet-cu110>=2.0.0a"
# Install the cpu-only version
python3 -m pip install -U --pre "mxnet>=2.0.0a"
Now, you can go ahead to install GulonNLP. Open your terminal and type:
python3 -m pip install -U -e
You can also install all the extra requirements by typing:
python3 -m pip install -U -e ."[extras]"
If you come across any issue while installing related to user permissions, please refer to this guide.
Tutorials:
- Data Loading and Vocabularies
- Representation Learning
- Language Modelling
- Machine Translation
- Sentimental Analysis
- Text Generation
Kashgari (2.1k github stars)
Powerful NLP transfer learning framework that you can use to build state-of-the-art models in 5 minutes for Named Entity Recognition(NER), part-of-speech tagging(POS), and model classification. It can be used by beginners, people in academics, and researchers.
Kashgari features:
- Easy to customize, well documented, and straightforward.
- Kashgari allows you to use state-art-of-the-art models for your Natural Language Processing projects.
- It allows you to build multi-label classification models, create custom models, and so much more. Learn more here
- Allows you to adjust your model’s hyperparameters, use custom optimizers and callbacks, create custom models, and others.
- Kashgari has built-in pretrained models which makes transfer learning very easy.
- Kashagri is simple, fast, and scalable
- You can export your models and directly deploy them to the cloud using tensorflow serving.
Installation
Kashgari requires you to have Python 3.6+ installed on your system.
Using pip
- For TensorFlow 2.x:
pip install 'kashgari>=2.0.0
- For TensorFlow 1.14+:
pip install 'kashgari>=2.0.0
- For Keras:
pip install 'kashgari<1.0.0
Tutorials:
LexNLP (460 Github stars)
LexNLP developed by LexPredict is a Python library for working with real unstructured legal text, including contracts, policies, procedures, and other types of materials, classifiers and clause type, tools for building new clustering and classification methods, hundreds of unit tests of real legal documents.
Features:
- It provides pre-trained models for segmentation, word embedding and topic models, classifiers for document and clause type.
- Fact extraction.
- Tools for building new clustering and classification methods.
Installation:
Requires you have installed Python 3.6
pip install lexnlp
Tutorials:
- Lex-NLP Library for Automated Text Extraction and NER(Named Entity Recognition)
- Hooking up an AI Pipeline to a Word Document in Python
Tensorflow Text (770 Github stars)
TensorFlow Text provides a collection of text-related classes and ops ready to use with TensorFlow 2.0. The library can perform the preprocessing regularly required by text-based models and includes other features useful for sequence modeling not provided by core TensorFlow.
The benefit of using these ops in your text preprocessing is that they are done in the TensorFlow graph. You don’t need to worry about tokenization in training being different than the tokenization at inference, or managing preprocessing scripts.
Tensorflow Text features:
- Facilitates a large toolkit for working with text
- Allows integration with a large suite of Tensorflow tools to support projects from problem definition through training, evaluation, and launch
- Reduces complexity at serving time and prevents training-serving skew
Installation:
Using Pip
Please note: When installing TF Text with pip install, please note the version of TensorFlow you are running, as you should specify the corresponding minor version of TF Text (eg. for tensorflow==2.3.x use tensorflow_text==2.3.x).
pip install -U tensorflow-text==<version>
Installing from source
Please note that TF Text needs to be built in the same environment as TensorFlow. Thus, if you manually build TF Text, it is highly recommended that you also build TensorFlow.
If building on MacOS, you must have coreutils installed. It is probably easiest to do with Homebrew.
Build and install TensorFlow.
- Clone the TF Text repo: git clone https://github.com/tensorflow/text.git
- Run the build script to create a pip package: ./oss_scripts/run_build.sh .After this step, there should be a *.whl file in current directory. File name similar to tensorflow_text-2.5.0rc0-cp38-cp38-linux_x86_64.whl.
- Install the package to environment: pip install ./tensorflow_text-*-*-*-os_platform.whl
Tutorials:
Text preprocessing
Text Classification
Text Generation
Snorkel (4.7k GitHub stars)
Data labeling tool, you can label, build, and manage training data programmatically. The first component of a Snorkel pipeline includes labeling functions, which are designed to be weak heuristic functions that predict a label given unlabelled data.
Features:
- It supports Tensorflow/Keras, Pytorch, Spark, Dask, and Scikit-Learn.
- It provides APIs for labeling, analysis, preprocessing, slicing, mapping, utils, and classification.
Installation:
Snorkel requires Python 3.6 or later.
Using pip (Recommended)
pip install snorkel
Using conda
conda install snorkel -c conda-forge
Please note: If you’re using Windows, it’s highly recommended using Docker (tutorial example) or the Linux subsystem.
Tutorials:
- Spam detection: Is this YouTube comment spam?
- Spouse (Relation Extraction): Does this sentence imply that the two marked people are spouses?
- Visual_relation (Visual Relationship Detection): Is object A riding object B in the image, carrying it, or neither?
- Crowdsourcing: Is this tweet about the weather expressing a positive, negative or neutral sentiment?
- Multitask (Multi-Task Learning): A synthetic task demonstrating the native Snorkel multi-task classifier API
- Recsys (Recommender Systems): Will this user read and like this book?
- Drybell: Is a celebrity mentioned in this news article?
Tensorflow Lingvo (2.3k Github stars)
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.
Tensorflow Lingvo features:
- Lingvo supports natural language processing (NLP) tasks but it is also applicable to models used for tasks such as image segmentation and point cloud classification.
- Lingvo can be used to train the “on production scale” datasets.
- Lingvo provides additional support for synchronous and asynchronous distributed training.
- Quantization support has been built directly into the Lingvo framework.
Installation:
Using pip:
pip3 install lingvo
Installing from sources:
Check if you’ve met the following prerequisites
- TensorFlow 2.5 installed on your system
- C++ compiler (only g++ 7.3 is officially supported)
- The bazel build system.
Refer to docker/dev.dockerfile for a set of working requirements.
Now, git clone the repository, then use bazel to build and run targets directly. The python -m module commands in the codelab need to be mapped onto bazel run commands.
Using docker:
Docker configurations are available for both situations. Instructions can be found in the comments on the top of each file.
lib.dockerfile has the Lingvo pip package preinstalled.
dev.dockerfile can be used to build Lingvo from sources.
Tutorial:
SpaCy (21k Github stars )
spaCy is a library for advanced Natural Language Processing in Python and Cython. It’s built on the very latest research and was designed from day one to be used in real products.
spaCy comes with pretrained pipelines and currently supports tokenization and training for 60+ languages. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification, and more, multi-task learning with pre-trained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment, and workflow management.
Features:
- Support for custom models in PyTorch, TensorFlow, and other frameworks.
- Support for 60+ languages.
- Support for pre-trained word vectors and embeddings.
- Easy model packaging, deployment, and workflow management.
- Linguistically-motivated tokenization.
Installation:
It supports macOS / OS X , Linux , and Windows (Cygwin, MinGW, Visual Studio). You also need to have Python 3.6+ version (only 64 bit) installed on your system.
Using pip
Before you continue with the installation, make sure that your pip, setuptools, and wheel are up to date.
pip install -U pip setuptools wheel pip install spacy
Using conda
conda install -c conda-forge spacy
Tutorials:
- Categorization of emotions in Reddit posts (Text Classification)
- Predicting whether a GitHub issue is about documentation (Text Classification)
- Detecting entities in Medical Records with PyTorch
- Detecting fashion brands in online comments (Named Entity Recognition)
Flair (11k GitHub stars)
Flair is a simple framework for state-of-the-art NLP. It allows you to use state-of-the-art models for your NLP tasks, such as Named Entity Recognition (NER), part-of-speech tagging (POS), sense disambiguation, and classification. It provides special support for biomedical data and also supports a rapidly growing number of languages.
Flair features:
- It’s entirely built on Pytorch and so you can easily build and train your Flair models.
- State-of-the-art NLP models that you can use for your text.
- Allows you to combine different words and document embeddings with simple interfaces.
Installation:
It requires you to have Pytorch 1.5+ and currently supports Python 3.6. Here is how for Ubuntu 16.04.
pip install flair
Tutorials:
- Introduction to FlairNLP for Python
- Text Tagging
- Training your own Flair Embeddings
- Training a Zero Shot Text Classifier (TARS)
- How to build a text classifier with Flair
- Practical approach of state-of-the-art Flair in named entity recognition
Open-source MLOps tools for your NLP projects – comparison
Github stars
|
Windows
|
Linux
|
MacOS
|
Tensorflow
|
Pytorcht
|
Other frameworks
|
Data labelling
|
Data preprocessing
|
Model development
|
Model deployment
|
|
Adapt NLP |
329
|
|
|
|
|
|
|
|
|
|
|
Flair |
11k
|
|
|
|
|
|
|
|
|
|
|
spaCy |
21k
|
|
|
|
|
|
|
|
|
|
|
Tensorflow lingvo |
2.3k
|
|
|
|
|
|
|
|
|
|
|
Snorkel |
4.7k
|
|
|
|
|
|
|
|
|
|
|
Tensorflow text |
770
|
|
|
|
|
|
|
|
|
|
|
LexNLP |
460
|
|
|
|
|
|
|
|
|
|
|
Kashgari |
2.1k
|
|
|
|
|
|
|
|
|
|
|
GulonNLP |
2.3k
|
|
|
|
|
|
|
|
|
|
|
AutoGulon |
3.5k
|
|
|
|
|
|
|
|
|
|
|
Best MLOps as a service tools for NLP projects
Neu.ro
Neuro MLOps platform provides complete solution and management of the infrastructure and processes you need for successful ML development at scale. It provides the complete MLOps lifecycle which includes data collection, model development, model training, experiment tracking, deployment and monitoring. Neu.ro provides management of the infrastructure and processes for successful ML development at scale.
Setup
Installation
Advisable to create a new virtual environment first. It requires you to have Python 3.7 installed.
pip install -U neuromation
Or
<pre class="hljs" style="display: block; overflow-x: auto; padding: 0.5em; color: rgb(51, 51, 51); background: rgb(248, 248, 248);">pip install -U neuromation
</pre>
How to:
- Sign up at neu.ro
- Upload data either with webUI or CLI
- Setup development environment (allows you to use GPU)
- Train model or download a pretrained model
- Run notebook(Jupyter)
Check out this ML Cookbook to help you get started with an NLP project.
AutoNLP
AutoNLP provides an automatic way to train state-of-the-art NLP models, seamlessly integrated with the Hugging Face ecosystem, and deploy them in a scalable environment automatically. It’s an automated way to train, evaluate, and deploy state-of-the-art NLP models for different tasks. It automatically fine-tunes a working model for deployment based on the dataset that you provide.
Setup
Installation:
To use pip:
pip install -U autonlp
Please note: you need to install git lfs to use the cli
How to:
- Sign in to your account
- Create a new model
- Upload your dataset
- Train your autonlp model
- Track model progress
- Make predictions
- Deploy your model
Check out the AutoNLP documentation for your specific use case.
neptune.ai
Neptune tracks machine learning experiments, stores your model’s metadata (log metrics, performance charts, video, audio, text, record data exploration), provides a model registry where you can version, stores and lets you query your models anytime, and provides an effective way for your team to collaborate. Neptune lets you customize the UI and manage users in an on-prem environment or on the cloud.
Setup
Installation
pip install neptune
How to log your project metadata,
- Create a Neptune account
- Create a new project in Neptune
In your code editor,
- Initialize a run with your API token and log the model’s metadata you want to.
- Run your codes and your project on Neptune will be automatically updated!
Checkout Neptune docs to explore more and run your experiments risk free!

DataRobot
DataRobot which has now acquired Algorithmia is a platform that automates the end-to-end process of building, deploying, and maintaining machine learning (ML) and artificial intelligence (AI) at scale. It’s a no-code app builder, and a platform where you can deploy, monitor, manage, and govern all your models in production, regardless of how they were created or when and where they were deployed.
Setup
- It currently supports python 2.7 and >=3.4
pip3 install datarobot
- With Python 3.6+,
pip3 install requests requests-toolbelt
How to create a new project:
- Sign in to your account
- Install dependencies
- Load and Profile your data
- Start modelling
- Review and interpret model
- Deploy model
- Choose an application
Check this doc for a proper walkthrough on how to use these steps.
Read also
AWS MLOps Frameworks
It helps you streamline and enforce tool best practices for productionizing your machine learning models. It’s an extendable framework that provides a standard interface for managing ML pipelines for AWS ML services and third-party services. The solution template lets you upload your trained models, configure the orchestration of the pipeline, and monitor pipeline operations. It allows you to leverage a preconfigured ML pipeline and also automatically deploy a trained model with an inference endpoint.
How to setup a new project:
- Sign in to your AWS account
- Create a new SageMaker Studio
- Create a new project
- Select an MLOps architecture (development, evaluation,or deployment) you want.
- Add data to AWS S3 bucket
- Create pipeline and training files.
Check out this docs on how to set up a new project. You can also check out this tutorial on how to create a simple project.
Azure Machine Learning MLOps
Azure MLOps allows you to experiment, develop, and deploy models into production with end-to-end lineage tracking. It allows you to create reproducible ML pipelines, reusable software environments, deploy models from anywhere, govern the ML lifecycle, and closely monitor models in production for any issues. It allows you to automate the end-to-end ML lifecycle with pipelines which lets you update models, test new models, and continuously deploy new ML Models.
Setup
Installation
You need to install the Azure CLI
How to:
- Sign in to Azure devops
- Create a new project
- Import the project repository
- Setup project environment
- Create a pipeline
- Train and deploy model
- Set up continuous integration pipeline
Check out this doc on how to go about these processes
Vertex AI | Google Cloud AI Platform
Vertex AI is a machine learning platform where you can access all Google Cloud services in one place to deploy and maintain AI models. It brings together the Google Cloud services for building ML under one, unified UI and API. You can use Vertex AI to easily train and compare models using AutoML or your custom code, with all your models stored in one central model repository.
Setup
Installation
You can either use Google Cloud console and Cloud shell or you install Cloud SDK to your system.
How to create a new project (Using cloud shell):
- Sign in to your account
- Create a new project(Ensure billing is enabled for your account)
- Activate cloud shell
- Create a storage bucket
- Train your model
- Deploy to google cloud
Check out this doc for a walkthrough on how to follow these steps.
Check also
MLOps as a service tools for NLP projects – comparison
|
Data collection and management
|
Data preparation and feature engineering
|
Model training and deployment
|
Model monitoring and experiment tracking
|
ML metadata store
|
Model registry & management
|
AutoNLP |
No |
No |
Yes |
No |
No |
No |
Azure MLOps |
No |
No |
Yes |
Yes |
Yes |
Yes |
AWS MLOps |
No |
No |
Yes |
Yes |
No |
No |
DataRobot |
No |
No |
Yes |
Yes |
No |
Yes |
Neptune |
No |
No |
No |
Yes |
Yes |
Yes |
Neu.ro |
Yes |
No |
Yes |
Yes |
No |
No |
Vertex AI |
No |
Yes |
Yes |
Yes |
Yes |
Yes |
Conclusion
I’ve talked about why you need MLOps and how you can choose a good tool for your project. I also listed some NLP MLOps tools and highlighted some cool features about them. Not sure which tool to try out? Check the comparison table I made to see which best fits your project. I hope you try out some of the listed tools and do let me know what you think. Thanks for reading!
Additional references
- https://www.analyticsinsight.net/top-mlops-based-tools-for-enabling-effective-machine-learning-lifecycle/
- https://auto.gluon.ai/stable/index.html
- https://github.com/EthicalML/awesome-production-machine-learning#feature-stores
- https://awesomeopensource.com/projects/ml
- https://www.kdnuggets.com/2020/07/labelling-data-using-snorkel.html
- https://www.google.com/amp/s/hub.packtpub.com/google-introduces-and-open-sources-lingvo-a-scalable-tensorflow-framework-for-sequence-to-sequence-modeling/amp/
- https://github.com/explosion/projects/tree/v3/tutorials
- https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.analyticsvidhya.com/blog/2021/06/mlops-machine-learning-operations-in-microsoft-azure/&ved=2ahUKEwjTvtrGlOTxAhU0EWMBHcisAF04ChAWMAB6BAgJEAI&usg=AOvVaw2wZNytl5MEJyh-797cDVxr
- https://www.tensorflow.org/text
- https://www.linkedin.com/pulse/why-we-need-mlops-philip-coachman/
- https://geekflare.com/google-clouds-vertex-ai/