I vividly remember a machine learning hackathon that I participated in two years ago, when I was at the beginning of my data science career. It was a pre-qualification hackathon for a bootcamp organised by Data Science Nigeria.
The dataset had information about certain employees. I had to predict if an employee should get a promotion or not. After days of trying to improve and engineer features, the model’s accuracy seemed to oscillate around 80%.
I needed to do something to improve my score on the leaderboard. I started tuning the model manually – got a bit better results. The accuracy moved grew to 82% by changing a parameter (this move is really important, as anyone who’d done a hackathon will attest!). Excited, I started tuning other hyperparameters, but not all turned out so well. I was already exhausted, imagine working 7 hours straight to improve a model. Pretty tiring.
I knew about GridSearchCV and RandomSearchCV. I tried out GridSearchCV and took more than 3 hours to give me results from the range of values I provided. Even worse, the results from GridSearchCV weren’t better. Frustrated, I decided to try RandomSearchCV. This brought a little joy, my accuracy moved from 82 to 86%.
After a lot of trials and no improvement, I went back to manual tuning to see what I could gain. I ended up with about 90% accuracy by the end of the hackathon. I wish I had known about tools for optimizing hyperparameters faster! Luckily, even though I wasn’t part of the top 50, I still qualified for the bootcamp.
That was in the past. Now, I know that there are good hyperparameter tuning tools I could’ve used, and I’m excited to share them with you.
Before you start hypertuning, make sure these things are done:
- Get a baseline. You can get this with smaller models, fewer iterations, default parameters, or a manually tuned model.
- Separate your data into training, validation and test sets.
- Use early stopping rounds with large epochs to prevent overfitting.
- Set up your full model pipeline before training.
Now, I’d like to discuss some terms that I’ll be using in the article:
- Model parameter – A model parameter is that which your model learns from the data, like features, relationships, etc., which you can’t manually tune (not feature engineering).
- Model hyperparameter – Hyperparameters are those values you can tune manually from the model itself, like the learning rate, number of estimators, type of regularization, etc..
- Optimization – A process of adjusting hyperparameters in order to minimize the cost function by using one of the optimization techniques.
- Hyperparameter optimization – Hyperparameter optimization is simply a search to get the best set of hyperparameters that gives the best version of a model on a particular dataset.
- Bayesian optimization – Part of a class of sequential model-based optimization (SMBO) algorithms for using results from a previous experiment to improve the next.
- Hyperparameter sampling – Simply specifying the parameter sampling method to use over the hyperparameter space.
I’m not against using GridSearchCV. It’s a good option, only that it’s really time-consuming and computationally expensive. If you’re like me, with a busy schedule, you’ll definitely find better options.
A better alternative is RandomSearch CV, which uses random hyperparameter values to pick the best hyperparameters. It’s way faster than GridSearchCV. The downside here is that since it takes random values, we can’t be so sure that those values are the best combination.
But really, when do I know I need to do hyperparameter optimization?
One of the mistakes we often make as data scientists is using the default parameters of a model. You’re probably not using the best version of your model by the default parameters you used.
Sometimes, when your model is overfitting (performing well on training set and poor on test dataset), or underfitting (performing poorly on training dataset and well on test dataset), optimizing your hyperparameters can really help. A little tweak can make a large difference, from 60% accuracy to 80% accuracy, or even more!
Okay, let’s wrap up the introduction. By the end of this article, you’ll learn:
- The top best hyperparameter tuning tools,
- The various open sourced services (free to use) and paid services,
- Their features and advantages,
- The frameworks they support,
- How to choose the best tool for your project,
- How you can add them to your project.
We’ll start with a TL;DR comparison of all the tools discussed below.
Comparing tools for model tuning and hyperparameter optimization
If you’re strapped for time, this table should help you pick a good tool to try in your use case. For detailed descriptions of each tool, keep reading below the table.
Moving on, I’ll start with some open-source tools. Each tool will be described in the following way:
- Brief introduction of the tool,
- Core features/ Advantages of the tool,
- Steps on how to use the tool,
- Additional links on how to use the tool in your project.
1. Ray Tune
Ray provides a simple, universal API for building distributed applications. Tune is a Python library for experiment execution and hyperparameter tuning at any scale. Tune is one of the many packages of Ray. Ray Tune is a Python library that speeds up hyperparameter tuning by leveraging cutting-edge optimization algorithms at scale.
Why should you use RayTune?
Here are some features:
- It integrates easily with many optimization libraries, such as Ax/Botorch and HyperOpt.
- Scaling can be done without changing your code.
- Tune leverages a variety of cutting edge-optimization algorithms, such as Ax/Botorch, HyperOpt, and Bayesian Optimization, enabling you to scale them transparently.
- Tune parallelizes across multiple GPUs and multiple nodes, so you don’t have to build your own distributed system to speed up training.
- You can visualise results automatically with tools like Tensorboard.
- It provides a flexible interface for optimization algorithms, you can easily implement and scale new optimization algorithms with few lines of code.
- It supports any machine learning framework including Pytorch, Tensorflow, XGBoost, LIghtGBM, Scikit-Learn, and Keras.
Using it takes five simple steps (I’m supposing you already have your data preprocessed):
- Install tune
pip install ray[tune]
Whether you want to implement Ray Tune in your ML project using Tensorflow, Pytorch, or any other framework, a lot of tutorials are available. Here are some to check out:
- Machine learning and reinforcement learning projects from Ray.
- “Hyperparameter Tuning” to implement the steps listed above in Tensorflow.
- Hyperparameter tuning with Keras and Ray Tune.
You can learn more about configuring Ray Tune and its capabilities from this article: “Ray Tune: a hyperparameter library for fast hyperparameter tuning at any scale”.
Optuna is designed specially for machine learning. It’s a black-box optimizer, so it needs an objective function. This objective function decides where to sample in upcoming trials, and returns numerical values (the performance of the hyperparameters). It uses different algorithms, such as GridSearch, Random Search, Bayesian and Evolutionary algorithms to find the optimal hyperparameter values.
Some of the features are:
- Efficient sampling and pruning algorithms.
- Easy to install, needs few requirements.
- Easier to use than Hyperopt.
- Uses distributed optimization.
- You can define search spaces using Python syntax, including conditionals and loops.
- You can analyze optimization results visually.
- Easy scalability with little or no changes to the code.
Optuna uses the pruning algorithm. Pruning is a technique used in machine learning and search algorithms to reduce the size of decision trees, by removing sections of the tree that are non-critical and redundant to classify instances.
Pruning in Optuna automatically stops unpromising trials at the early stages of the training, which you can also call automated early-stopping. Optuna provides the following pruning algorithms:
- Asynchronous Successive Halving algorithm.
- Hyperband algorithm.
- Median pruning algorithm which uses the median stopping rule.
- Threshold pruning algorithm, used to detect outlying metrics of the trials.
I’ll highlight the simple steps you need to use Optuna:
- First, install Optuna with `pip install optuna`, if it’s not already installed.
- Define your model.
- Choose parameters to optimize.
- Create a study.
- Define objective function.
- Check trial results.
Tutorial and example codes to check out:
You can also check out this read: “Optuna Guides how to monitor hyperparameter optimization runs”, to better understand how Optuna optimizes your hyperparameters.
Might be useful
Check how you can keep track of your hyperparameter optimization process with Neptune + Optuna integration.
From the official documentation, Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.
Hyperopt uses Bayesian optimization algorithms for hyperparameter tuning, to choose the best parameters for a given model. It can optimize a large-scale model with hundreds of hyperparameters.
Hyperopt currently implements three algorithms:
- Random Search,
- Tree of Parzen Estimators,
- Adaptive TPE.
Hyperopt has been designed to accommodate Bayesian optimization algorithms based on Gaussian processes and regression trees, but unfortunately they’re not currently implemented.
Features of Hyperopt:
HyperOpt requires 4 essential components for the optimization of hyperparameters:
- the search space,
- the loss function,
- the optimization algorithm,
- a database for storing the history (score, configuration)
Steps to use Hyperopt in your project:
- Initialize the space over which to search.
- Define the objective function.
- Select the search algorithm to use.
- Run the hyperopt function.
- Analyze the evaluation outputs stored in the trials object.
Here are some hands-on tutorials you can check out:
Here’s also a good kaggle notebook you can try out.
Scikit-Optimize is an open-source library for hyperparameter optimization in Python. It was developed by the team behind Scikit-learn. It’s relatively easy to use compared to other hyperparameter optimization libraries.
It has sequential model-based optimization libraries known as Bayesian Hyperparameter Optimization (BHO). The advantage of BHO is that they find better model settings than random search in fewer iterations.
What really is Bayesian optimization?
Bayesian optimization is a sequential design strategy for global optimization of black box functions that does not assume any functional forms. It’s usually used to optimize computationally expensive functions. At least that’s what Wikipedia says.
But, in plain English, BO evaluates hyperparameters that appear more promising from past results, and finds better settings, rather than using random search with fewer iterations. The performance of the past hyperparameter affects future decisions.
Features of Scikit-Optimize:
- Sequential model-based optimization,
- Built on NumPy, SciPy, and Scikit-Learn,
- Open source, commercially usable, BSD license.
Scikit-Optimize Bayesian optimization using a Gaussian process is based on an algorithm called gp_optimize. You can learn more about it here. If you’re interested in how to build your own Bayesian Optimizer from scratch, you can also check out this tutorial: “How to Implement Bayesian Optimization From Scratch in Python”.
Here are the simple steps you need to follow to use Scikit-Optimize:
- Start by installing skopt using pip install skopt, if it’s not already installed.
- Define the model.
- Decide the parameter to optimize.
- Define search space.
- Define the objective function.
- Run the optimization.
Here’s a list of tutorials you can follow to implement Scikit Optimize in your project:
For an in-depth explanation of Scikit-Optimize features, check out this article.
Might be useful
Check how you can keep track of your hyperparameter optimization process with Neptune + Scikit Optimize integration.
5. Microsoft’s NNI (Neural Network Intelligence)
NNI is a free, open-source AutoML toolkit developed by Microsoft. It’s used to automate feature engineering, model compression, neural architecture search, and hyper-parameter tuning.
How does it work?
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments, like local machine, remote servers, and cloud.
Microsoft’s NNI supports frameworks like Pytorch, Tensorflow, Keras, Theano, Caffe2, etc., and libraries like Sckit-learn, XGBoost, CatBoost, and LightGBM for now.
- Many popular automatic tuning algorithms (like TPE, Random Search, GP Tuner, Metis Tuner, and so on) and early stop algorithms (Medianstop, Curvefitting assessors).
- NAS (Neural Architecture Search) framework for users to easily specify neural architectures they want to use.
- Support for NAS algorithms like ENAS(Efficient Neural Architecture Search) and DARTS(Differentiable Architectural Search) through NNI trial SDK.
- Automatic feature engineering through NNI trial SDK; you don’t have to create an NNI experiment, simply import a built-in auto-feature-engineering algorithm in your trial code and run!
- Command line tools and a web UI to manage training experiments.
- Extensible API to customize your AUTO ML models.
- It could be trained on your local machine, remote servers, Azure Machine Learning, kubernetes-based services like Kube Flow, Adapt DL, Open pal, etc..
- It has methods for hyperparameter tuning which includes Exhaustive search, Heuristic search, Bayesian optimization and RL based.
- Some of it’s Bayesian optimization algorithms for hyperparameter tuning are TPE, GP Tuner, Metis Tuner, BOHB, and more.
Here are the steps you need to follow to use NNI:
- Install NNI on either Windows or Linux and verify the installation.
- Define and update the model.
- Enable NNI API.
- Define search space.
- Define your experiment.
- Prepare trial.
- Prepare tuner.
- Prepare config file.
- Run the experiment.
Looking for how to implement this in your project? Check out this tutorial: “How to add Microsoft’s NNI to your Project”.
6. Google’s Vizer
AI Platform Vizier is a black-box optimization service for tuning hyperparameters in complex machine learning models.
It not only optimizes your model’s output by tuning the hyperparameters, it can also be used effectively to tune parameters in a function.
How does it work?
- Determine study configuration by setting the result and the hyperparameters that affect it.
- Creates study from configuration values already set, uses it to perform experiments to produce results.
Why should you use Vizer?
- It’s easy to use.
- Requires minimal user configuration and setup.
- Hosts state-of-the-art black-box optimization algorithms.
- High availability.
- Scalable to millions of trials per study, thousands of parallel trial evaluations per study, and billions of studies.
Follow these steps to use Vizer:
7. AWS Sage Maker
AWS Sage Maker is a fully-managed machine learning service. With SageMaker, you can build and train machine learning models quickly and with ease. You can directly deploy them into a production-ready hosted environment right after building, just like a complete package.
It also provides machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment. SageMaker natively supports bring-your-own-algorithms and frameworks, also offering flexible distributed training options that adjust to your specific workflows.
SageMaker uses Random Search or Bayesian Search for model hyperparameter tuning. For Bayesian Search, it either improves performance with a combination of hyperparameter values close to the combination from the best previous training job, or it chooses a set of hyperparameter values far removed from those it has tried.
AWS Sagemaker takes care of abstracting a ton of software development skills necessary to accomplish the task, while still being highly effective, flexible and cost-effective. You can focus on what’s more important, the core ML experiments, and SageMaker supplements the remaining necessary skills with easy abstracted tools similar to your existing workflow. All your tools in one place, so you can move easily from data preprocessing, to model building and model deployment, all in one platform.
In a nutshell, you can use SageMaker’s automatic model tuning with built-in algorithms, custom algorithms, and SageMaker pre-built containers for machine learning frameworks.
Learn more in these tutorials:
👉 Check also the comparison between SageMaker and Neptune.
7. Azure Machine Learning
Azure was created by Microsoft, leveraging its constantly-expanding worldwide network of data centers. Azure is a cloud platform for building, deploying, and managing services and applications, anywhere.
Azure Machine Learning is a separate and modernized service that delivers a complete data science platform. Complete in the sense that it’s from data preprocessing, to model building, to model deployment and maintenance, the whole data science journey on a single platform. It supports both code-first and low-code experiences. If you’re someone who likes little or no code, you should consider using Azure Machine Learning Studio.
Azure Machine Learning Studio is a web portal on Azure Machine Learning that contains low-code and no-code options (drag-and-drop) for project authoring and asset management.
Azure Machine Learning supports the following hyperparameter sampling methods:
- Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values. It also supports early termination of low-performance runs, just like early stopping in Tree based models.
- Grid sampling can only be employed when all hyperparameters are discrete, and is used to try every possible combination of parameters in the search space.
- Bayesian Sampling chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection.
With a really large hyperparameter search space (hundreds of hyperparameters and more), it would take a lot of iterations to try out every single combination. To save your time, you could set early iteration stopping to those experiments (iterations) where results were poorer than earlier. Azure has early stopping policies to help you with that:
- Bandit Policy.You can use a bandit policy to stop a run (experiment or iteration) if the target performance metric underperforms the best run so far by a specified margin.
- Median stopping policy. Like bandit policy, it abandons the run where the target performance metric is worse than the median of the running averages for all runs.
- Truncation selection policy. A truncation selection policy cancels all runs at each evaluation interval where percentages are lower than the truncation percentage value you specified.
How can you start using Azure for hyperparameter tuning in your project?
- Define a search space.
- Configure sampling. You can choose out of Grid Sampling, Bayseian sampling, or Random sampling.
- Configure early termination. You can use either Bandit Policy, Median stopping policy, or Truncation stopping policy.
- Run a hypertuning training experiment.
Check out this tutorial module by Microsoft: “Tune Hyperparameters with Azure Machine Learning”.
I hope I was able to teach you one or two things about hyperparameter tools. Don’t just let it sit there in your head, try them out! And feel free to reach out to me, I’d love to learn your opinions and preferences. Thanks for reading!
Other resources to also check out:
- Top Hyperparameter Optimisation Tools
- Tuning ML Models: Scaling, Workflows, and Architecture
- What is the difference between parameter and hyperparameter?
- Hyperparameter Tuning in Python: a Complete Guide 2021
- Hyperparameter tuning with Keras and Ray Tune
- The Best Tools to Visualize Metrics and Hyperparameters of Machine Learning Experiments
- Hyperparameter tuning for Machine learning models
How to Track Hyperparameters of Machine Learning Models?
Kamil Kaczmarek | Posted July 1, 2020
Machine learning algorithms are tunable by multiple gauges called hyperparameters. Recent deep learning models are tunable by tens of hyperparameters, that together with data augmentation parameters and training procedure parameters create quite complex space. In the reinforcement learning domain, you should also count environment params.
Data scientists should control hyperparameter space well in order to make progress.
Here, we will show you recent practices, tips & tricks, and tools to track hyperparameters efficiently and with minimal overhead. You will find yourself in control of most complex deep learning experiments!
Why should I track my hyperparameters? a.k.a. Why is that important?
Almost every deep learning experimentation guideline, like this deep learning book, advises you on how to tune hyperparameters to make models work as expected. In the experiment-analyze-learn loop, data scientists must control what changes are being made, so that the “learn” part of the loop is working.
Oh, forgot to say that random seed is a hyperparameter as well (especially in the RL domain: check this Reddit for example).
What is current practice in the hyperparameters tracking?
Let’s review one-by-one common practices for managing hyperparameters. We focus on how to build, keep and pass hyperparameters to your ML scripts.Continue reading ->