Hyperparameter Tuning in Python: a Complete Guide
Choosing the correct hyperparameters for machine learning or deep learning models is one of the best ways to extract the last juice out of your models. In this article, I will show you some of the best ways to do hyperparameter tuning that are available today.
What is the difference between parameter and hyperparameter?
First, let’s understand the differences between a hyperparameter and a parameter in machine learning.
- Model parameters: These are the parameters that are estimated by the model from the given data. For example the weights of a deep neural network.
- Model hyperparameters: These are the parameters that cannot be estimated by the model from the given data. These parameters are used to estimate the model parameters. For example, the learning rate in deep neural networks.
They are required for making predictions
They are required for estimating the model parameters
They are estimated by optimization algorithms(Gradient Descent, Adam, Adagrad)
They are estimated by hyperparameter tuning
They are not set manually
They are set manually
The final parameters found after training will decide how the model will perform on unseen data
The choice of hyperparameters decide how efficient the training is. In gradient descent the learning rate decide how efficient and accurate the optimization process is in estimating the parameters
Model parameters vs model hyperparameters | Source: GeeksforGeeks
What is hyperparameter tuning and why it is important?
Hyperparameter tuning (or hyperparameter optimization) is the process of determining the right combination of hyperparameters that maximizes the model performance. It works by running multiple trials in a single training process. Each trial is a complete execution of your training application with values for your chosen hyperparameters, set within the limits you specify. This process once finished will give you the set of hyperparameter values that are best suited for the model to give optimal results.
Needless to say, It is an important step in any Machine Learning project since it leads to optimal results for a model. If you wish to see it in action, here’s a research paper that talks about the importance of hyperparameter optimization by experimenting on datasets.
How to do hyperparameter tuning? How to find the best hyperparameters?
Choosing the right combination of hyperparameters requires an understanding of the hyperparameters and the business use-case. However, technically, there are two ways to set them.
Manual hyperparameter tuning
Manual hyperparameter tuning involves experimenting with different sets of hyperparameters manually i.e. each trial with a set of hyperparameters will be performed by you. This technique will require a robust experiment tracker which could track a variety of variables from images, logs to system metrics.
There are a few experiment trackers that tick all the boxes. neptune.ai is one of them. It offers an intuitive interface and an open-source package neptune-client to facilitate logging into your code. You can easily log hyperparameters and see all types of data results like images, metrics, etc. Head over to the docs to see how you can log different metadata to Neptune.
Alternative solutions include W&B, Comet, or MLflow. Check more tools for experiment tracking & management here.
Advantages of manual hyperparameter optimization:
- Tuning hyperparameters manually means more control over the process.
- If you’re researching or studying tuning and how it affects the network weights then doing it manually would make sense.
Disadvantages of manual hyperparameter optimization:
- Manual tuning is a tedious process since there can be many trials and keeping track can prove costly and time-consuming.
- This isn’t a very practical approach when there are a lot of hyperparameters to consider.
Read about how to manually optimize Machine Learning model hyperparameters here.
Automated hyperparameter tuning
Automated hyperparameter tuning utilizes already existing algorithms to automate the process. The steps you follow are:
- First, specify a set of hyperparameters and limits to those hyperparameters’ values (note: every algorithm requires this set to be a specific data structure, e.g. dictionaries are common while working with algorithms).
- Then the algorithm does the heavy lifting for you. It runs those trials and fetches you the best set of hyperparameters that will give optimal results.
In the blog, we will talk about some of the algorithms and tools you could use to achieve automated tuning. Let’s get to it.
Hyperparameter tuning methods
In this section, I will introduce all of the hyperparameter optimization methods that are popular today.
In the random search method, we create a grid of possible values for hyperparameters. Each iteration tries a random combination of hyperparameters from this grid, records the performance, and lastly returns the combination of hyperparameters that provided the best performance.
In the grid search method, we create a grid of possible values for hyperparameters. Each iteration tries a combination of hyperparameters in a specific order. It fits the model on each and every combination of hyperparameters possible and records the model performance. Finally, it returns the best model with the best hyperparameters.
Tuning and finding the right hyperparameters for your model is an optimization problem. We want to minimize the loss function of our model by changing model parameters. Bayesian optimization helps us find the minimal point in the minimum number of steps. Bayesian optimization also uses an acquisition function that directs sampling to areas where an improvement over the current best observation is likely.
Tree-structured Parzen estimators (TPE)
The idea of Tree-based Parzen optimization is similar to Bayesian optimization. Instead of finding the values of p(y|x) where y is the function to be minimized (e.g., validation loss) and x is the value of hyperparameter the TPE models P(x|y) and P(y). One of the great drawbacks of tree-structured Parzen estimators is that they do not model interactions between the hyper-parameters. That said TPE works extremely well in practice and was battle-tested across most domains.
Hyperparameter tuning algorithms
These are the algorithms developed specifically for doing hyperparameter tuning.
Hyperband is a variation of random search, but with some explore-exploit theory to find the best time allocation for each of the configurations. You can check this research paper for further references.
Population-based training (PBT)
This technique is a hybrid of the two most commonly used search techniques: Random Search and manual tuning applied to Neural Network models.
PBT starts by training many neural networks in parallel with random hyperparameters. But these networks aren’t fully independent of each other.
It uses information from the rest of the population to refine the hyperparameters and determine the value of hyperparameter to try. You can check this article for more information on PBT.
BOHB (Bayesian Optimization and HyperBand) mixes the Hyperband algorithm and Bayesian optimization. You can check this article for further reference.
️ HyperBand and BOHB: Understanding State of the Art Hyperparameter Optimization Algorithms
Tools for hyperparameter optimization
Now that you know what are the methods and algorithms let’s talk about tools, and there are a lot of those out there.
Some of the best hyperparameter optimization libraries are:
- Metric Optimization Engine (MOE)
Scikit-learn has implementations for grid search and random search and is a good place to start if you are building models with sklearn.
For both of those methods, scikit-learn trains and evaluates a model in a k fold cross-validation setting over various parameter choices and returns the best model.
- Random search: with
randomsearchcvruns the search over some number of random parameter combinations
- Grid search:
gridsearchcvruns the search over all parameter sets in the grid
Tuning models with scikit-learn is a good start but there are better options out there and they often have random search strategies anyway.
May be useful
Check how you can keep track of your hyperparameters search when working with Scikit-learn.
Scikit-optimize uses a Sequential model-based optimization algorithm to find optimal solutions for hyperparameter search problems in less time.
Scikit-optimize provides many features other than hyperparameter optimization such as:
- store and load optimization results,
- convergence plots,
- comparing surrogate models
Optuna uses a historical record of trails details to determine the promising area to search for optimizing the hyperparameter and hence finds the optimal hyperparameter in a minimum amount of time.
It has the pruning feature which automatically stops the unpromising trails in the early stages of training. Some of the key features provided by optuna are:
- Lightweight, versatile, and platform-agnostic architecture
- Pythonic search spaces
- Efficient optimization algorithms
- Easy parallelization
- Quick visualization
You can refer to the official documentation for tutorials on how to start using optuna.
May be useful
Check how you can keep track of your hyperparameters search when working with Optuna.
Hyperopt is one of the most popular hyperparameter tuning packages available. Hyperopt allows the user to describe a search space in which the user expects the best results allowing the algorithms in hyperopt to search more efficiently.
Currently, three algorithms are implemented in hyperopt.
To use hyperopt, you should first describe:
- the objective function to minimize
- space over which to search
- the database in which to store all the point evaluations of the search
- the search algorithm to use
This tutorial will walk you through how to structure the code and use the hyperopt package to get the best hyperparameters.
You can also read this article to learn more about how to use Hyperopt.
️ Optuna vs Hyperopt: Which Hyperparameter Optimization Library Should You Choose?
5. Ray Tune
Ray Tune is a popular choice of experimentation and hyperparameter tuning at any scale. Ray uses the power of distributed computing to speed up hyperparameter optimization and has an implementation for several states of the art optimization algorithms at scale.
Some of the core features provided by ray tune are:
- distributed asynchronous optimization out of the box by leveraging Ray.
- Easily scalable.
- Provided SOTA algorithms such as ASHA, BOHB, and Population-Based Training.
- Supports Tensorboard and MLflow.
- Supports a variety of frameworks such Sklearn, XGBoost, TensorFlow, PyTorch, etc.
You can refer to this tutorial to learn how to implement ray tune for your problem.
6. Keras Tuner
Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your TensorFlow program. When you build a model for hyperparameter tuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hyperparameter tuning is called a hypermodel.
You can define a hypermodel through two approaches:
- By using a model builder function
- By subclassing the HyperModel class of the Keras Tuner API
You can also use two pre-defined HyperModel classes – HyperXception and HyperResNet for computer vision applications.
You can refer to this official tutorial for further implementation details.
BayesianOptimization is a package designed to minimize the number of steps required to find a combination of parameters that are close to the optimal combination.
This method uses a proxy optimization problem (finding the maximum of the acquisition function) which, although it’s still a hard problem, it’s cheaper in the computational sense, and common tools can be employed. Therefore Bayesian Optimization is most adequate for situations where sampling the function to be optimized is a very expensive endeavor.
Visit the GitHub repo here to see it in action.
8. Metric Optimization Engine
MOE (Metric Optimization Engine) is an efficient way to optimize a system’s parameters when evaluating parameters is time-consuming or expensive.
It is ideal for problems in which
- the optimization problem’s objective function is a black box, not necessarily convex or concave,
- derivatives are unavailable,
- and we seek a global optimum, rather than just a local one.
This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access.
Visit the GitHub repo to read more about it.
Spearmint is a software package that also performs Bayesian optimization. The software is designed to automatically run experiments (thus the code name spearmint) in a manner that iteratively adjusts a number of parameters so as to minimize some objectives in as few runs as possible.
Read and experiment about Spearmint in this GitHub repo.
GPyOpt is Gaussian process optimization using GPy. It performs global optimization with different acquisition functions.
Among other functionalities, it is possible to use GPyOpt to optimize physical experiments (sequentially or in batches) and tune the parameters of Machine Learning algorithms. It is able to handle large data sets via sparse Gaussian process models.
Unfortunately, GPyOpt maintenance has been shut by the authors of the repo but you can still use the package for your experiments.
Head over to its GitHub repo here.
SigOpt fully integrates automated hyperparameter tuning with training runs tracking to give you a sense of the bigger picture and the path to reach your best model.
With features like highly customizable search spaces and multimetric optimization, SigOpt can advance your model with a simple API for sophisticated hyperparameter tuning before taking it into production.
Visit the documentation here to learn more about SigOpt’s hyperparameter tuning.
While traditional Bayesian hyperparameter optimizers model the loss of machine learning algorithms on a given dataset as a black box function to be minimized, FAst Bayesian Optimization on LArge data Sets (FABOLAS) models loss and computational cost across dataset size and uses these models to carry out Bayesian optimization with an extra degree of freedom.
You can check out the function implementing fabolas here and the research paper here.
️ Best Tools for Model Tuning and Hyperparameter Optimization
Hyperparameter tuning resources and examples
In this section, I will share some hyperparameter tuning examples implemented for different ML and DL frameworks.
Random forest hyperparameter tuning
- Understanding Random forest hyperparameters
- Bayesian hyperparameter tuning for random forest
- Random forest tuning using grid search
XGBoost hyperparameter tuning
- XGBoost hyperparameters tuning python
- XGBoost hyperparameters tuning in R
- XGBoost hyperparameter using hyperopt
- Optuna hyperparameter tuning example
LightGBM hyperparameter tuning
- Understanding LightGBM parameters
- LightGBM hyperparameter tuning example
- Optuna for LIghtGBM hyperparameter tuning
CatBoost hyperparameter tuning
Keras hyperparameter tuning
- Hyperparameter tuning using Keras- tuner example
- Keras CNN hyperparameter tuning
- How to use Keras models in scikit-learn grid search
- Keras Tuner: Lessons Learned From Tuning Hyperparameters of a Real-Life Deep Learning Model
PyTorch hyperparameter tuning
Congratulations, you’ve made it to the end! Hyperparameter tuning represents an integral part of any Machine Learning project, so it’s always worth digging into this topic. In this blog, we talked about different hyperparameter tuning algorithms and tools which are widely used and studied. But even though, we covered a good chunk of techniques and tools, as a wise man once said, there’s no end to knowledge.
Here are some of the latest research happening in the area that might interest you:
- Improving Hyperparameter Optimization By Planning Ahead
- Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges
- Experimental Investigation And Evaluation Of Model-Based Hyperparameter Optimization
That’s it for now, stay tuned for more, adios!