Choosing the correct hyperparameters for machine learning or deep learning models is one of the best ways to extract the last juice out of your models. In this article, I will show you some of the best ways to do hyperparameter tuning that are available today.
What is the difference between parameter and hyperparameter?
First, let’s understand the differences between a hyperparameter and a parameter in machine learning.
 Model parameters: These are the parameters that are estimated by the model from the given data. For example the weights of a deep neural network.
 Model hyperparameters: These are the parameters that cannot be estimated by the model from the given data. These parameters are used to estimate the model parameters. For example, the learning rate in deep neural networks.
PARAMETERS

HYPERPARAMETERS

They are required for making predictions 
They are required for estimating the model parameters 
They are estimated by optimization algorithms(Gradient Descent, Adam, Adagrad) 
They are estimated by hyperparameter tuning 
They are not set manually 
They are set manually 
The final parameters found after training will decide how the model will perform on unseen data 
The choice of hyperparameters decide how efficient the training is. In gradient descent the learning rate decide how efficient and accurate the optimization process is in estimating the parameters 
Model parameters vs model hyperparameters  Source: GeeksforGeeks
What is hyperparameter tuning and why it is important?
Hyperparameter tuning (or hyperparameter optimization) is the process of determining the right combination of hyperparameters that maximizes the model performance. It works by running multiple trials in a single training process. Each trial is a complete execution of your training application with values for your chosen hyperparameters, set within the limits you specify. This process once finished will give you the set of hyperparameter values that are best suited for the model to give optimal results.
Needless to say, It is an important step in any Machine Learning project since it leads to optimal results for a model. If you wish to see it in action, here’s a research paper that talks about the importance of hyperparameter optimization by experimenting on datasets.
How to do hyperparameter tuning? How to find the best hyperparameters?
Choosing the right combination of hyperparameters requires an understanding of the hyperparameters and the business usecase. However, technically, there are two ways to set them.
Manual hyperparameter tuning
Manual hyperparameter tuning involves experimenting with different sets of hyperparameters manually i.e. each trial with a set of hyperparameters will be performed by you. This technique will require a robust experiment tracker which could track a variety of variables from images, logs to system metrics.
There are a few experiment trackers that tick all the boxes. neptune.ai is one of them. It offers an intuitive interface and an opensource package neptuneclient to facilitate logging into your code. You can easily log hyperparameters and see all types of data results like images, metrics, etc. Head over to the docs to see how you can log different metadata to Neptune.
Alternative solutions include W&B, Comet, or MLflow. Check more tools for experiment tracking & management here.
Advantages of manual hyperparameter optimization:
 Tuning hyperparameters manually means more control over the process.
 If you’re researching or studying tuning and how it affects the network weights then doing it manually would make sense.
Disadvantages of manual hyperparameter optimization:
 Manual tuning is a tedious process since there can be many trials and keeping track can prove costly and timeconsuming.
 This isn’t a very practical approach when there are a lot of hyperparameters to consider.
Read about how to manually optimize Machine Learning model hyperparameters here.
Automated hyperparameter tuning
Automated hyperparameter tuning utilizes already existing algorithms to automate the process. The steps you follow are:
 First, specify a set of hyperparameters and limits to those hyperparameters’ values (note: every algorithm requires this set to be a specific data structure, e.g. dictionaries are common while working with algorithms).
 Then the algorithm does the heavy lifting for you. It runs those trials and fetches you the best set of hyperparameters that will give optimal results.
In the blog, we will talk about some of the algorithms and tools you could use to achieve automated tuning. Let’s get to it.
Hyperparameter tuning methods
In this section, I will introduce all of the hyperparameter optimization methods that are popular today.
Random Search
In the random search method, we create a grid of possible values for hyperparameters. Each iteration tries a random combination of hyperparameters from this grid, records the performance, and lastly returns the combination of hyperparameters that provided the best performance.
Grid Search
In the grid search method, we create a grid of possible values for hyperparameters. Each iteration tries a combination of hyperparameters in a specific order. It fits the model on each and every combination of hyperparameters possible and records the model performance. Finally, it returns the best model with the best hyperparameters.
Bayesian Optimization
Tuning and finding the right hyperparameters for your model is an optimization problem. We want to minimize the loss function of our model by changing model parameters. Bayesian optimization helps us find the minimal point in the minimum number of steps. Bayesian optimization also uses an acquisition function that directs sampling to areas where an improvement over the current best observation is likely.
Treestructured Parzen estimators (TPE)
The idea of Treebased Parzen optimization is similar to Bayesian optimization. Instead of finding the values of p(yx) where y is the function to be minimized (e.g., validation loss) and x is the value of hyperparameter the TPE models P(xy) and P(y). One of the great drawbacks of treestructured Parzen estimators is that they do not model interactions between the hyperparameters. That said TPE works extremely well in practice and was battletested across most domains.
Hyperparameter tuning algorithms
These are the algorithms developed specifically for doing hyperparameter tuning.
Hyperband
Hyperband is a variation of random search, but with some exploreexploit theory to find the best time allocation for each of the configurations. You can check this research paper for further references.
Populationbased training (PBT)
This technique is a hybrid of the two most commonly used search techniques: Random Search and manual tuning applied to Neural Network models.
PBT starts by training many neural networks in parallel with random hyperparameters. But these networks aren’t fully independent of each other.
It uses information from the rest of the population to refine the hyperparameters and determine the value of hyperparameter to try. You can check this article for more information on PBT.
BOHB
BOHB (Bayesian Optimization and HyperBand) mixes the Hyperband algorithm and Bayesian optimization. You can check this article for further reference.
Learn more
️ HyperBand and BOHB: Understanding State of the Art Hyperparameter Optimization Algorithms
Tools for hyperparameter optimization
Now that you know what are the methods and algorithms let’s talk about tools, and there are a lot of those out there.
Some of the best hyperparameter optimization libraries are:
 Scikitlearn
 ScikitOptimize
 Optuna
 Hyperopt
 Ray.tune
 Talos
 BayesianOptimization
 Metric Optimization Engine (MOE)
 Spearmint
 GPyOpt
 SigOpt
 Fabolas
1. Scikitlearn
Scikitlearn has implementations for grid search and random search and is a good place to start if you are building models with sklearn.
For both of those methods, scikitlearn trains and evaluates a model in a k fold crossvalidation setting over various parameter choices and returns the best model.
Specifically:
 Random search: with
randomsearchcv
runs the search over some number of random parameter combinations  Grid search:
gridsearchcv
runs the search over all parameter sets in the grid
Tuning models with scikitlearn is a good start but there are better options out there and they often have random search strategies anyway.
May be useful
Check how you can keep track of your hyperparameters search when working with Scikitlearn.
2. Scikitoptimize
Scikitoptimize uses a Sequential modelbased optimization algorithm to find optimal solutions for hyperparameter search problems in less time.
Scikitoptimize provides many features other than hyperparameter optimization such as:
 store and load optimization results,
 convergence plots,
 comparing surrogate models
3. Optuna
Optuna uses a historical record of trails details to determine the promising area to search for optimizing the hyperparameter and hence finds the optimal hyperparameter in a minimum amount of time.
It has the pruning feature which automatically stops the unpromising trails in the early stages of training. Some of the key features provided by optuna are:
 Lightweight, versatile, and platformagnostic architecture
 Pythonic search spaces
 Efficient optimization algorithms
 Easy parallelization
 Quick visualization
You can refer to the official documentation for tutorials on how to start using optuna.
May be useful
Check how you can keep track of your hyperparameters search when working with Optuna.
4. Hyperopt
Hyperopt is one of the most popular hyperparameter tuning packages available. Hyperopt allows the user to describe a search space in which the user expects the best results allowing the algorithms in hyperopt to search more efficiently.
Currently, three algorithms are implemented in hyperopt.
To use hyperopt, you should first describe:
 the objective function to minimize
 space over which to search
 the database in which to store all the point evaluations of the search
 the search algorithm to use
This tutorial will walk you through how to structure the code and use the hyperopt package to get the best hyperparameters.
You can also read this article to learn more about how to use Hyperopt.
Related articles
️ Optuna vs Hyperopt: Which Hyperparameter Optimization Library Should You Choose?
5. Ray Tune
Ray Tune is a popular choice of experimentation and hyperparameter tuning at any scale. Ray uses the power of distributed computing to speed up hyperparameter optimization and has an implementation for several states of the art optimization algorithms at scale.
Some of the core features provided by ray tune are:
 distributed asynchronous optimization out of the box by leveraging Ray.
 Easily scalable.
 Provided SOTA algorithms such as ASHA, BOHB, and PopulationBased Training.
 Supports Tensorboard and MLflow.
 Supports a variety of frameworks such Sklearn, XGBoost, TensorFlow, PyTorch, etc.
You can refer to this tutorial to learn how to implement ray tune for your problem.
6. Keras Tuner
Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your TensorFlow program. When you build a model for hyperparameter tuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hyperparameter tuning is called a hypermodel.
You can define a hypermodel through two approaches:
 By using a model builder function
 By subclassing the HyperModel class of the Keras Tuner API
You can also use two predefined HyperModel classes – HyperXception and HyperResNet for computer vision applications.
You can refer to this official tutorial for further implementation details.
7. BayesianOptimization
BayesianOptimization is a package designed to minimize the number of steps required to find a combination of parameters that are close to the optimal combination.
This method uses a proxy optimization problem (finding the maximum of the acquisition function) which, although it’s still a hard problem, it’s cheaper in the computational sense, and common tools can be employed. Therefore Bayesian Optimization is most adequate for situations where sampling the function to be optimized is a very expensive endeavor.
Visit the GitHub repo here to see it in action.
8. Metric Optimization Engine
MOE (Metric Optimization Engine) is an efficient way to optimize a system’s parameters when evaluating parameters is timeconsuming or expensive.
It is ideal for problems in which
 the optimization problem’s objective function is a black box, not necessarily convex or concave,
 derivatives are unavailable,
 and we seek a global optimum, rather than just a local one.
This ability to handle blackbox objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access.
Visit the GitHub repo to read more about it.
9. Spearmint
Spearmint is a software package that also performs Bayesian optimization. The software is designed to automatically run experiments (thus the code name spearmint) in a manner that iteratively adjusts a number of parameters so as to minimize some objectives in as few runs as possible.
Read and experiment about Spearmint in this GitHub repo.
10. GPyOpt
GPyOpt is Gaussian process optimization using GPy. It performs global optimization with different acquisition functions.
Among other functionalities, it is possible to use GPyOpt to optimize physical experiments (sequentially or in batches) and tune the parameters of Machine Learning algorithms. It is able to handle large data sets via sparse Gaussian process models.
Unfortunately, GPyOpt maintenance has been shut by the authors of the repo but you can still use the package for your experiments.
Head over to its GitHub repo here.
11. SigOpt
SigOpt fully integrates automated hyperparameter tuning with training runs tracking to give you a sense of the bigger picture and the path to reach your best model.
With features like highly customizable search spaces and multimetric optimization, SigOpt can advance your model with a simple API for sophisticated hyperparameter tuning before taking it into production.
Visit the documentation here to learn more about SigOpt’s hyperparameter tuning.
12. Fabolas
While traditional Bayesian hyperparameter optimizers model the loss of machine learning algorithms on a given dataset as a black box function to be minimized, FAst Bayesian Optimization on LArge data Sets (FABOLAS) models loss and computational cost across dataset size and uses these models to carry out Bayesian optimization with an extra degree of freedom.
You can check out the function implementing fabolas here and the research paper here.
Read also
️ Best Tools for Model Tuning and Hyperparameter Optimization
Hyperparameter tuning resources and examples
In this section, I will share some hyperparameter tuning examples implemented for different ML and DL frameworks.
Random forest hyperparameter tuning
 Understanding Random forest hyperparameters
 Bayesian hyperparameter tuning for random forest
 Random forest tuning using grid search
XGBoost hyperparameter tuning
 XGBoost hyperparameters tuning python
 XGBoost hyperparameters tuning in R
 XGBoost hyperparameter using hyperopt
 Optuna hyperparameter tuning example
LightGBM hyperparameter tuning
 Understanding LightGBM parameters
 LightGBM hyperparameter tuning example
 Optuna for LIghtGBM hyperparameter tuning
CatBoost hyperparameter tuning
Keras hyperparameter tuning
 Hyperparameter tuning using Keras tuner example
 Keras CNN hyperparameter tuning
 How to use Keras models in scikitlearn grid search
 Keras Tuner: Lessons Learned From Tuning Hyperparameters of a RealLife Deep Learning Model
PyTorch hyperparameter tuning
Final thoughts
Congratulations, you’ve made it to the end! Hyperparameter tuning represents an integral part of any Machine Learning project, so it’s always worth digging into this topic. In this blog, we talked about different hyperparameter tuning algorithms and tools which are widely used and studied. But even though, we covered a good chunk of techniques and tools, as a wise man once said, there’s no end to knowledge.
Here are some of the latest research happening in the area that might interest you:
 Improving Hyperparameter Optimization By Planning Ahead
 Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges
 Experimental Investigation And Evaluation Of ModelBased Hyperparameter Optimization
That’s it for now, stay tuned for more, adios!