Choosing the correct hyperparameters for machine learning or deep learning models is one of the best ways to extract the last juice out of your models. In this article, I will show you some of the best ways to do hyperparameter tuning that are available today (in 2021).
What is the difference between parameter and hyperparameter?
First, let’s understand the differences between a hyperparameter and a parameter in machine learning.
- Model parameters: These are the parameters that are estimated by the model from the given data. For example the weights of a deep neural network.
- Model hyperparameters: These are the parameters that cannot be estimated by the model from the given data. These parameters are used to estimate the model parameters. For example, the learning rate in deep neural networks.
What is hyperparameter tuning and why it is important?
Hyperparameter tuning is the process of determining the right combination of hyperparameters that allows the model to maximize model performance. Setting the correct combination of hyperparameters is the only way to extract the maximum performance out of models.
How do I choose good hyperparameters?
Choosing the right combination of hyperparameters is not an easy task. There are two ways to set them.
- Manual hyperparameter tuning: In this method, different combinations of hyperparameters are set (and experimented with) manually. This is a tedious process and cannot be practical in cases where there are many hyperparameters to try.
- Automated hyperparameter tuning: In this method, optimal hyperparameters are found using an algorithm that automates and optimizes the process.
Hyperparameter tuning methods
In this section, I will introduce all of the hyperparameter tuning methods that are popular today.
In the random search method, we create a grid of possible values for hyperparameters. Each iteration tries a random combination of hyperparameters from this grid, records the performance, and lastly returns the combination of hyperparameters which provided the best performance.
In the grid search method, we create a grid of possible values for hyperparameters. Each iteration tries a combination of hyperparameters in a specific order. It fits the model on each and every combination of hyperparameter possible and records the model performance. Finally, it returns the best model with the best hyperparameters.
Tuning and finding the right hyperparameters for your model is an optimization problem. We want to minimize the loss function of our model by changing model parameters. Bayesian optimization helps us find the minimal point in the minimum number of steps. Bayesian optimization also uses an acquisition function that directs sampling to areas where an improvement over the current best observation is likely.
Tree-structured Parzen estimators (TPE)
The idea of Tree-based Parzen optimization is similar to Bayesian optimization. Instead of finding the values of p(y|x) where y is the function to be minimized (e.g., validation loss) and x is the value of hyperparameter the TPE models P(x|y) and P(y). One of the great drawbacks of tree-structured Parzen estimators is that they do not model interactions between the hyper-parameters. That said TPE works extremely well in practice and was battle-tested across most domains.
Hyperparameter tuning algorithms
These are the algorithms developed specifically for doing hyperparameter tuning.
Hyperband is a variation of random search, but with some explore-exploit theory to find the best time allocation for each of the configurations. You can check this research paper for further references.
Population-based training (PBT)
This technique is a hybrid of two most commonly used search techniques: Random Search and manual tuning applied to Neural Network models.
PBT starts by training many neural networks in parallel with random hyperparameters. But these networks aren’t fully independent of each other.
It uses information from the rest of the population to refine the hyperparameters and determine the value of hyperparameter to try. You can check this article for more information on PBT.
BOHB (Bayesian Optimization and HyperBand) mixes the Hyperband algorithm and Bayesian optimization. You can check this article for further reference.
Tools for hyperparameter optimization
Now that you know what are the methods and algorithms let’s talk about tools, and there are a lot of those out there.
Some of the best Hyperparameter Optimization libraries are:
Scikit-learn has implementations for grid search and random search and is a good place to start if you are building models with sklearn.
For both of those methods, scikit-learn trains and evaluates a model in a k fold cross-validation setting over various parameter choices and returns the best model.
Tuning models with scikit-learn is a good start but there are better options out there and they often have random search strategy anyway.
Hyperopt is one of the most popular hyperparameter tuning packages available. Hyperopt allows the user to describe a search space in which the user expects the best results allowing the algorithms in hyperopt to search more efficiently.
Currently, three algorithms are implemented in hyperopt.
To use hyperopt, you should first describe:
- the objective function to minimize
- space over which to search
- the database in which to store all the point evaluations of the search
- the search algorithm to use
This tutorial will walk you through how to structure the code and use the hyperopt package to get the best hyperparameters.
Scikit-optimize uses Sequential model-based optimization algorithm to find optimal solutions for hyperparameter search problems in less time.
Scikit-optimize provides many features other than hyperparameter optimization such as:
- store and load optimization results,
- convergence plots,
- comparing surrogate models
Optuna uses a historical record of trails details to determine the promising area to search for optimizing the hyperparameter and hence finds the optimal hyperparameter in a minimum amount of time.
It has the pruning feature which automatically stops the unpromising trails in the early stages of training. Some of the key features provided by optuna are:
You can refer to the official documentation for tutorials on how to start using optuna.
Tune is a popular choice of experimentation and hyperparameter tuning at any scale. Ray uses the power of distributed computing to speed up the hyperparameter optimization and has an implementation for several states of the art optimization algorithms at scale.
Some of the core features provided by ray tune are:
You can refer to this tutorial to learn how to implement ray tune for your problem.
The Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your TensorFlow program. When you build a model for hyperparameter tuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hyperparameter tuning is called a hypermodel.
You can define a hypermodel through two approaches:
- By using a model builder function
- By subclassing the HyperModel class of the Keras Tuner API
You can refer to this official tutorial for further implementation details.
May be useful
Hyperparameter tuning resources and examples
In this section, I will share some hyperparameter tuning examples implemented for different ML and DL frameworks.
In this article, I shared with you different hyperparameter tuning algorithms and tools which are currently widely used. Hopefully, you will find them useful in your projects.
ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It
10 mins read | Jakub Czakon | Posted November 26, 2020
Let me share a story that I’ve heard too many times.
”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…
…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…
…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”
– unfortunate ML researcher.
And the truth is, when you develop ML models you will run a lot of experiments.
Those experiments may:
- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)
And as a result, they can produce completely different evaluation metrics.
Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.
This is where ML experiment tracking comes in.Continue reading ->