Over the past decade, machine learning has grown to be quite the game-changer for different businesses and organizations. Thus, it is not a surprise that numerous tailored, cloud-based solutions emerged to support data scientists’ work in many ways.
According to Forbes, the global machine learning market is projected to grow from $7.3B in 2020 to $30.6B in 2024, attaining a compound annual growth rate of 43%. To fuel this growth, data scientists and ML engineers are tasked with building more models to keep up with the ever dynamic business needs of customers and shareholders.
But, just building models is often not enough:
- you have to maintain these models,
- evaluate and monitor its performance,
- deploy them at scale,
- sometimes experiment with new ideas
- maintaining production.
Most data scientists are not software engineers, so meeting business demands at scale can sometimes seem impossible. With Machine Learning as a Service (MLaaS), data scientists can manage these complexities (more) comfortably.
What is Machine Learning as a Service?
You probably saw various ‘as a service’ offerings such as Platform as a Service, (PaaS), Software as a Service (SaaS), Backend as a Service (BaaS), etc.
Machine Learning as a service (MLaaS) is not a new kid on the block for aaS (no pun intended), but lately, it has been getting lots of attention because of how useful and powerful it has been to data scientists, machine learning engineers, data engineers and other machine learning professionals.
Machine learning as service is an umbrella term for collection of various cloud-based platforms that use machine learning tools to provide solutions that can help ML teams with:
- out-of-the box predictive analysis for various use cases,
- data pre-processing,
- model training and tuning,
- run orchestration
- model deployment.
It leverages the power of cloud computing to offer machine learning services on the go.
What to expect from MLaaS platform
Ok, but what do MLaaS platforms actually help you do?
Let me give you some examples.
Data Management: As more companies move their data from on-premise storage to cloud storage systems, the need to properly organize these data arises. Since MLaaS platforms are essentially cloud providers, that is, they offer cloud storage, they provide ways to properly manage data for machine learning experiments, data pipelining, thus making it easier for data engineers to access and process the data.
Access to ML Tools: MLaaS providers offer tools such as predictive analytics and data visualization for businesses. They also make available APIs for sentiment analysis, face recognition, creditworthiness assessments, business intelligence, healthcare, etc.
Data scientists don’t need to worry about the actual computations of these operations because they are abstracted by MLaaS providers. Some MLaaS providers even give you a drag and drop interface for machine learning experimentation and model building (with its limitations, of course).
Ease of use: MLaaS offers Data scientists the means to get started quickly with machine learning without having to endure the tedious software installation processes or provide their own servers as they would with most other cloud computing services. With MLaaS the provider’s data centers handle the actual computation, so it’s convenient at every turn for businesses.
Cost efficiency: Building an ML workstation is expensive, at the time of writing this article, a single Nvidia GPU costs $699 while a Google cloud TPU v2 goes for $4.50.
So in fact, when choosing the in-cloud TPU the data scientist would have already computed over 155 hours of experiments when reaching the initial cost of buying the Nvidia GPU. Also, chipset needs a significant amount of power to work so the electricity bill will rise.
MLaaS can also be beneficial in the development phase because you only pay for hardware when it is actually used.
MLaaS platforms offer these solutions and many more. Let’s have a brief overview of some platforms offering these MLaaS solutions and how they can be accessed.
Best tools out there
The MLaaS market is quite big, it is valued at $1.0 billion in 2019 and is expected to reach $8.48 billion by 2025, I will give an overview of some machine learning as a service platforms by Amazon, Google, Microsoft, and IBM (yep, all the big players have one).
Let’s dive in.
Amazon Machine Learning services
Amazon’s predictive analytics is arguably one of the best automated solutions on the market. It can load data from multiple sources, including Amazon RDS, Amazon Redshift, Microsoft SQL Server, MySQL, PostgreSQL, Github, Jira, Teradata, etc.
Most data preprocessing operations are performed automatically – the service can identify which fields are categorical and which are numerical.
This automation level acts both as an advantage and disadvantage for ML use because while automatic preprocessing saves time, sometimes the processed data won’t fit in the data scientist’s intention and extra customization would be needed (or you could just auto-build a model that really doesn’t make sense).
Amazon offers a robust collection of machine learning tools through its Amazon Machine Learning services and its Amazon SageMaker IDE. This platform offers pre-trained AI services that require no programming experience or machine learning expertise, making it easy for less advanced teams to use. It also gives a good baseline solution for the more advanced teams.
These services include:
- Amazon Comprehend for natural language processing
- Amazon Lex for building conversational chatbots
- Amazon Forecast for building time series forecasting models
- Amazon Rekognition for image and video analysis application
- Amazon Polly for superb text to speech translation.
- AWS DeepRacer, AWS DeepLens, AWS DeepComposer for deep learning services
These services contain quite comprehensive implementation documentations that are easy to understand and use.
Make the sage in the Sagemaker
For data scientists, the Amazon Sagemaker Studio is a machine learning environment that simplifies workflow by providing tools for quick model building and deployment. It stitches together most machine learning tools in one place, making it easy to go from building models to scalable deployment from its interface.
The platform contains Jupyter notebook to simplify data exploration and analysis without the hassle of server management.
Also, built-in SageMaker methods largely intersect with the ML APIs that Amazon suggests, but it allows data scientists to play with them and use their own datasets. It also allows you to add your own methods and run models via SageMaker leveraging its deployment features.
Generally, Amazon machine learning services provide enough freedom for both beginners and experienced data scientists. Companies that already use Amazon cloud services can use these services and may not need to transition to another cloud provider.
Microsoft Azure Machine Learning Studio
Azure Machine Learning Studio, is a development environment that creates a resourceful playground both for entry-level and experienced data scientists. It has tools that range from data analysis, data visualization, data labeling, to deep learning. As it is with Microsoft Windows, most operations in Azure ML Studio can be completed using a graphical drag-and-drop interface (makes it easy to use). This includes:
- Data exploration,
- Choosing modeling methods,
- Validating modeling results.
This Azure ML graphical interface visualizes each step within the workflow. To use Azure ML studio, less advanced ML teams can play around with the GUI to get a deeper understanding of major methods and models. Later, they can understand some more sophisticated data science concepts.
Azure ML studio provides an environment where data scientists can:
- Build models,
- Host them,
- Version said models,
- Monitor models.
It also allows these models to run on Azure, on-premise, or even Edge devices. It integrates well with Visual Studio and Github to make it easy for software engineers to access and track development. ML studio also supports a handful of data transformation tools that are helpful during data analysis.
The Azure AI services is a platform hosts a number of machine learning services such as :
- Azure Anomaly Detector to add anomaly detection capabilities to your apps.
- Azure Bot Service Intelligent, serverless bot service that scales on demand.
- Azure Cognitive Search for AI-powered cloud search service for mobile and web app development.
- Azure Databricks for fast, easy, and collaborative Apache Spark-based analytics platform
- Azure Machine Learning Bring AI to everyone with an end-to-end, scalable, trusted platform with experimentation and model management
- Azure Open Datasets Cloud platform to host and share curated open datasets to accelerate the development of machine learning models
- Speech to TextConverter
- Speech Translation
- Text Analytics
- Text to Speech
- Video Indexer for video insights
- and many others
The Azure AI Gallery is another big part of Azure ML. It’s a hub of machine learning solutions and data science model templates provided by the Azure community, which is made up of developers, researchers, data scientists, machine learning practitioners, and startups. It is available for use and exploration by community members.
The AI Gallery serves as an open-source hub for building models and algorithms. It requires some level of data science competence to operate and it also offers custom model engineering for ml templates. Its data science Services product offers a powerful toolset to manage data science and machine learning experiments, use popular frameworks like TensorFlow, scikit-learn, etc. (which isn’t available with ML Studio).
Note: One of the main benefits of using Azure is the variety of algorithms available to play with. The Studio supports more than 100 methods that address classification (binary+multiclass), anomaly detection, regression, recommendation, and text analysis.
Google Cloud Platform
Cloud AutoML is a cloud-based ml platform that offers a variety of machine learning products for beginner data scientists. Users can:
- Upload their datasets,
- Train custom models,
- Deploy them on the website.
Cloud AutoML is fully integrated with all Google’s services and it stores data in the cloud. Trained models can be deployed via the REST API interface. It relies on Google’s state-of-the-art transfer learning and neural architecture search technology.
There are several products available with Cloud AutoML that you can access via a graphical interface, They include:
Cloud AutoML is custom-built for deep learning models. It implements automatic deep transfer learning (meaning that it starts from an existing deep neural network trained on other data) and neural architecture search (meaning that it finds the right combination of extra network layers) for language pair translation and natural language classification.
Although GCP supports other languages and frameworks, its focus is clearly on one framework: Tensorflow, Google’s baby.
Google Machine Learning Engine caters to experienced data scientists. It’s very flexible, and it suggests using cloud infrastructure with TensorFlow as a machine learning framework. It also supports other popular algorithms like Linear Learner, TabNet, XGBoost, etc., and libraries like scikit-learn, etc.
Google also caters to data scientists and marketers through its Data Studio: one of the most popular tools for visualizing data.
It can be used to pull data directly out of Google’s suite of marketing tools, including:
- Google Analytics,
- Google AdWords,
- and Google Search Console.
It also supports connectors for database tools such as PostgreSQL and BigQuery.
IBM Watson Machine Learning
IBM Watson Machine Learning is an MLaaS platform that helps data scientists and developers in accelerating their AI and machine-learning deployment. It offers three options at the moment:
- IBM Cloud Pak for Data: The IBM Cloud for data is a suite of integrated solutions that automate the deployment of AI use cases in production.
- Watson Machine Learning Cloud: The Watson Machine Learning Cloud service is a set of REST APIs that you can call from any programming language to develop applications that make smarter decisions, solve tough problems, and improve user outcomes.
- Watson Machine Learning Server: The Watson Machine Learning Server is a new, single-node server that is part of the IBM Watson Studio, it offers analytical assets such as SPSS Modeler flows and machine learning model notebooks from Watson Studio Desktop can be deployed to the Watson Machine Learning server to get the advantage of scalable computational resources and deployment management services
They all offer Auto-AI lifecycle management for models. WIth its array of open source tools and techniques, IBM Machine Learning gives flexibility over model deployment and model retraining at scale to data scientists.
It also offers a list of AI-powered applications that are useful to businesses such as chatbots, sentiment analysis tools, prediction tools, etc. Pre-trained models tools for dynamical re-training are managed via the free IBM Watson OpenScale platform.
Watson Machine Learning also facilitates the collaboration of teams within a single modeling space through its built-in configurable dashboard. It also integrates easily with existing systems.
When not to use MLaaS
- If your data needs to be secure and on-premise, then you probably shouldn’t use MLaaS.
- If you need a ton of customization and state of the art algorithm implementation, you probably don’t need MLaaS (but it could still be useful)
- If you need to optimize training or serving costs of complex algorithms then you may want to take your infrastructure on-prem
When to use MLaaS
- If you are already using one of those MLaaS providers mentioned above at the company, integrating their MLaaS services to your system would be a good addition.
- If some/many use-cases can be outsourced to a predictive API, MLaaS is a sure way to go.
- If your application generates lots of data and you need to carry out tests frequently on the data, you definitely should try out MLaaS.
- If you run a microservice-based architecture in your company, MLaaS would help in proper management of some of those services.
With the complexity and the dynamism of the modern world, building a data science powerhouse on-prem can be too risky and inflexible. MLaaS is a perfect response for this issue, being able to be scaled to infinity and then rescaled back to the size of a modern PC with just a few clicks.
MLaaS offers a great number of tools and services that will help you to work more efficiently and tackle multiple problems a busy data scientist or data engineer faces every day. The biggest advantage is that there is no need to build infrastructure from scratch, pay for the machines, setup and maintenance.
ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
Jakub Czakon | Posted November 26, 2020
Let me share a story that I’ve heard too many times.
”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…
…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…
…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”
– unfortunate ML researcher.
And the truth is, when you develop ML models you will run a lot of experiments.
Those experiments may:
- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)
And as a result, they can produce completely different evaluation metrics.
Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.
This is where ML experiment tracking comes in.Continue reading ->