Keras Tuner: Lessons Learned From Tuning Hyperparameters of a Real-Life Deep Learning Model
The performance of your machine learning model depends on your configuration. Finding an optimal configuration, both for the model and for the training algorithm, is a big challenge for every machine learning engineer.
Model configuration can be defined as a set of hyperparameters which influences model architecture. In case of deep learning, these can be things like number of layers, or types of activation functions. Training algorithm configuration, on the other hand, influences the speed and quality of the training process. You can think of learning rate value as a good example of parameters in a training configuration.
To select the right set of hyperparameters, we do hyperparameter tuning. Even though tuning might be time- and CPU-consuming, the end result pays off, unlocking the highest potential capacity for your model.
If, like me, you’re a deep learning engineer working with TensorFlow/Keras, then you should consider using Keras Tuner. It’s a great tool that helps with hyperparameter tuning in a smart and convenient way.
In this article, I’ll tell you how I like to implement Keras Tuner in deep learning projects.
Why real projects matter
Everything that I’ll be doing is based on a real project. It’s not a toy problem, which is important to mention because you’ve probably seen other articles that aren’t based on real projects. Well, not this one!
Why is it so important to work with a project that reflects real life? It’s simple: these projects are much more complex at the core. Tools that might work well on a small synthetic problem, can perform poorly on real-life challenges. So, today I’ll show you what real value you can expect from Keras Tuner, and how to implement it in your own deep learning project.
Project description & problem statement
We’ll be doing an image segmentation task. We’ll try to segment multiple objects of interest on an image of a paper scan. Our end-goal is to extract particular pieces of text using segmentation. The below paper scan is an example of what we’re going to work with:
We can approach this problem using recent advances in computer vision. For example, U-NET, first introduced by Ronneberger, Fischer and Brox in 2015 to segment medical images, is a deep learning neural net that we can employ for our purposes. U-NET’s output is a set of masks, where each mask contains a particular object of interest.
Basically, for an input image that contains some objects, our deep neural net, when trained, should segment all objects of our interest and return a set of masks; each mask corresponds to an object of a particular class. If you’re interested in U-NET and how it works, I highly recommend reading this research paper.
Model & metrics selection
We won’t create a model from scratch. Since U-NET was introduced back in 2015, there are multiple implementations already available for us. Let’s take advantage of that.
I already imported the model, and I’m going to initialize an object of the model’s class. Let’s look at the documentation to see the variable parameters:
Signature: unet( input_size=(512, 512, 1), start_neurons=64, net_depth=4, output_classes=1, dropout=False, bn_after_act=False, activation='mish', pretrained_weights=None, ) Docstring: Generates U-Net architecture model (refactored version) Parameters: input_size : tuple (h, w, ch) for the input dimentions start_neurons : number of conv units in the first layer net_depth : number of convolution cascades including the middle cascade dropout : True -> use dropouts bn_after_act : True -> BatchNormalizations is placed after activation layers. Before otherwise activation : Type of activation layers: 'mish' - Mish-activation (default), 'elu' = ELU, 'lrelu' - LeakyReLU, 'relu' - ReLU pretrained_weights - None or path to weights-file Return: U-net model File: ~/ml_basis/synthetic_items/models/unet.py Type: function
Docstring for the U-NET class that shows a set of parameters for initialization
As we can see from the docstring, there are eight parameters that define our future model. Only five parameters affect the model’s architecture. Three other parameters,
input_size, output_classes and
pretrained_weights, let us define size for an input image, number of output classes, and a path to weights from a previously pre-trained model respectively.
We’ll focus one the 5 parameters that sharpen a model’s architecture. This is where we’ll employ Keras Tuner to do hyperparameter tuning.
To find the best model architecture via hyperparameters tuning, we need to select a metric for model evaluation. To approach this question, let’s recup that U-NET performs binary classification for every image pixel, linking each pixel to a particular object class. Objects of interest in our problem domain are quite small compared to the image size. With that in mind we should think of a metric that best accounts for such imbalance in pixels classification. This is where F1 score does particularly well in terms of model evaluation.
Keras Tuner implementation
High-level overview of available tuners
How can we get the most out of our model using Keras Tuner? First of all, it’s important to say that there are multiple tuners in Keras. They use different algorithms for hyperparameter search. Here are the algorithms, with corresponding tuners in Keras:
kerastuner.tuners.hyperband.Hyperbandfor the HyperBand-based algorithm;
kerastuner.tuners.bayesian.BayesianOptimizationfor the Gaussian process-based algorithm;
kerastuner.tuners.randomsearch.RandomSearchfor the random search tuner.
To give you an initial intuition of these methods, I can say that
RandomSearch is the least efficient approach. It doesn’t learn from previously tested parameter combinations, and simply samples parameter combinations from a search space randomly.
BayesianOptimization is similar to
RandomSearch in a way that they both sample a subset of hyperparameter combinations. The key difference is that
BayesianOptimization doesn’t sample hyperparameter combinations randomly, it follows a probabilistic approach under the hood. This approach takes into account already tested combinations and uses this information to sample the next combination for a test.
Hyperband is an optimized version of
RandomSearch in terms of search time and, therefore, resources allocation.
If you’re a curious person and want to learn more about Random Search, Bayesian Optimization and HyperBand, I definitely recommend this article.
Defining a search space and building a model
Keras tuner provides an elegant way to define a model and a search space for the parameters that the tuner will use – you do it all by creating a model builder function. To show you how easy and convenient it is, here’s how the model builder function for our project looks like:
# building a model using a model builder function def model_builder(hp): """ Build model for hyperparameters tuning hp: HyperParameters class instance """ # defining a set of hyperparametrs for tuning and a range of values for each start_neurons = hp.Int(name = 'start_neurons', min_value = 16, max_value = 128, step = 16) net_depth = hp.Int(name = 'net_depth', min_value = 2, max_value = 6) dropout = hp.Boolean(name = 'dropout', default = False) bn_after_act = hp.Boolean(name = 'bn_after_act', default = False) activation = hp.Choice(name = 'activation', values = ['mish', 'elu', 'lrelu'], ordered = False) input_size = (544,544,3) target_labels = [str(i) for i in range(21)] # building a model model = u(input_size = input_size, start_neurons = start_neurons, net_depth = net_depth, output_classes = len(target_labels), dropout = dropout, bn_after_act = bn_after_act, activation = activation) # model compilation model.compile(optimizer = Adam(lr = 1e-3), loss = weighted_cross_entropy, metrics = [f1, precision, recall, iou]) return model
You might have noticed that within the model builder function, there are multiple methods that we used to define a search space for hyperparameters –
hp.Int, hp.Boolean and
There’s nothing wild in how these methods operate, just a straightforward definition for the values that can be used in the parameter search space. What really matters is how we, as engineers, select the methods for each parameter, and define optimal ranges/options for the values to be sampled.
For example, it’s very important to carefully consider the parameters within
hp.Int, giving it meaningful minimum and maximum values (
max_value), and a proper
step. Your goal here is not to get overwhelmed with the number of options, which would cause the tuning process to take too much time and resources. It’s also crucial not to limit the search space in such a way that the tuner won’t even consider the best possible values. Given your expertise, knowledge and problem domain, think of what values might be the best to test out.
hp.Int, hp.Boolean and
hp.Choice, there are also a few other options available for us to define values in the search space. You can get familiar with these methods via the documentation.
As a last step in creating a model builder function, we
.compile our model before returning it.
By now you should already have a defined model builder function, and an idea of what algorithm you’d like to use for hyperparameter tuning. If you’re all set with the function and the algorithm, then you’re ready to initiate a tuner object.
For the image segmentation project we’re working on, I decided to stick with the
Hyperband algorithm, so my initialization code looks like this:
# tuner initialization tuner = kt.Hyperband(hypermodel = model_builder, objective = kt.Objective("val_f1", direction="max"), max_epochs = 20, project_name='hyperband_tuner')
Four parameters are used during initialization:
hypermodelis a model builder function we defined previously;
objectiveis a metric that our model is trying to improve (maximize or minimize). As you might note from the above code snippet, I explicitly specify the name of the metric function of my choice (val_f1 which stands for f1 score for the validation dataset) and the direction it must go (max);
max_epochsdefines the total number of epochs used to train each model. Official documentation suggests to “set this to a value slightly higher than the expected time to convergence for your largest Model”;
project_nameis a path to the folder where all tuning-related results will be placed and stored.
Tuning process launch
Launching a hypertuning process is similar to fitting a model in Keras/TensorFlow, except for the fact that we use
.search method on a tuner object instead of regular
.fit. Here is how I kicked off the tuning job for the project:
tuner.search(training_data=train_dg, steps_per_epoch=batches_per_epoch, validation_data=valid_dg, validation_steps=len(glob(img_dir + '/*')) / valid_batch_size, epochs=50, shuffle=True, verbose=1, initial_epoch=0, callbacks=[ClearTrainingOutput()], use_multiprocessing=True, workers=6)
.search method and all of the parameters used in there should be already familiar to you, the only thing I want to point out is
ClearTrainingOutput() callback, which essentially just clears out the output at the end of every training epoch. Here is the code for the
# defining a call that will clean out output at the end of every training epoch class ClearTrainingOutput(tf.keras.callbacks.Callback): def on_train_end(*args, **kwargs): IPython.display.clear_output(wait = True)
Getting tuning results
Here comes the most exciting part. Let’s see how well the tuner did. We’re curious what kind of results we were able to achieve, and which model configuration led to such performance.
To check the summary for the hypertuning job, we simply use
.results_summary() on a
tuner instance. Here is the complete code:
When run, the output for the tuner that we used within the project looks like this:
Results summary: Results in hyperband_tuner Showing 10 best trials Objective(name='val_f1', direction='max') Trial summary Hyperparameters: start_neurons: 32 net_depth: 5 dropout: False bn_after_act: True activation: mish tuner/epochs: 15 tuner/initial_epoch: 0 tuner/bracket: 0 tuner/round: 0 Score: 0.9533569884300232 Trial summary Hyperparameters: start_neurons: 80 net_depth: 5 dropout: False bn_after_act: True activation: elu tuner/epochs: 10 tuner/initial_epoch: 0 tuner/bracket: 1 tuner/round: 0 Score: 0.9258414387702942 Trial summary Hyperparameters: start_neurons: 128 net_depth: 3 dropout: True bn_after_act: False activation: elu tuner/epochs: 18 tuner/initial_epoch: 3 tuner/bracket: 1 tuner/round: 1 tuner/trial_id: 5bc455f16fad434a9452c51a71c741b0 Score: 0.9170311570167542 .............. Trial summary Hyperparameters: start_neurons: 48 net_depth: 3 dropout: True bn_after_act: False activation: mish tuner/epochs: 3 tuner/initial_epoch: 0 tuner/bracket: 1 tuner/round: 0 Score: 0.5333467721939087 Trial summary Hyperparameters: start_neurons: 96 net_depth: 3 dropout: True bn_after_act: True activation: mish tuner/epochs: 3 tuner/initial_epoch: 0 tuner/bracket: 1 tuner/round: 0 Score: 0.5279057025909424 Trial summary Hyperparameters: start_neurons: 32 net_depth: 3 dropout: False bn_after_act: False activation: elu tuner/epochs: 3 tuner/initial_epoch: 0 tuner/bracket: 1 tuner/round: 0 Score: 0.5064906477928162
In the above code snippet, I truncated the output for
.results_summary() to only 6 trials, showing the top 3 and the bottom 3 models. You might note that the worst model was only able to get up to 50 % for the metric of our choice (f1 score), whereas the best model led the performance to a remarkable 95.3 %.
Results & business impact
The range between the top and worst performer is more than 45 %. From such a difference, we can conclude that:
- Keras Tuner did an incredible job finding the best set for model parameters, showing a twofold increase in metric growth;
- We, as engineers, defined proper search space to sample from;
- Keras Tuner works well not only for toy problems but, most importantly, for real-life projects.
Let me also share with the business impact that we got. Model that has been better optimized for a particular problem domain using hyperparameters tuning led our service to a more stable and accurate long-run performance. Compared to our other similar but non-optimized services, we’ve seen a 15 % increase in an acceptance rate for our service as well as 25 % growth in classification confidence level.
Results tracking & sharing
Time for some MLOps. Great news: if you use Neptune, it now supports a complete integration with Keras Tuner.
With the Neptune integration, you can:
- see charts of logged metrics for every trial,
- see the parameters tried at every trial,
- see hardware consumption during search,
- log the best parameters after training,
- log hyperparameter search space,
- log Keras Tuner project directory with information for all the trials
Pretty exciting! Let me show you how I integrated Neptune for tracking in my project, to store tuning results in the cloud.
I’ll skip the basic steps, like Neptune initialization and experiment creation, since it’s well described in the official documentation.
Instead, I’ll focus on the code changes required to employ Neptune into your project workflow. The only change is in the tuner initialization part. Here is how the initialization looks like with the changes made:
import neptunecontrib.monitoring.kerastuner as npt_utils # tuner initialization tuner = kt.Hyperband(hypermodel = model_builder, objective = kt.Objective("val_f1", direction="max"), max_epochs = 20, project_name = 'hyperband_tuner', logger = npt_utils.NeptuneLogger())
We added logger into the tuner, so we can log the following after every trial:
- run parameters under ‘hyperparameters/values’ text log;
- loss and all the metrics defined when compiling Keras model;
- hardware consumption with CPU, GPU and memory during search.
In addition to that, I also take advantage of the
.log_tuner_info() method to log more information from Keras Tuner objects to Neptune. Here’s how I do it:
This will log pretty much everything that we want to keep track of:
- best score (‘best_score’ metric);
- best parameters (‘best_parameters’ property);
- score for every run (‘run_score’, metric);
- tuner project directory (‘TUNER_PROJECT_NAME’ artifact);
- parameter space (‘hyperparameters/space’ text log);
- name of the metric/loss used as objective (‘objective/name’ property);
- direction of the metric/loss used as objective (‘objective/direction’ property);
- tuner id (‘tuner_id’ property);
- best trial id (‘best_trial_id’ property).
Interested to know more? Watch this guided video, where an end-to-end integration is shown given an example of another project. Another great source of inspiration is the official documentation.
In this article, we’ve gone over a complete Keras tuner implementation for a real life project, showing its potential to grow model performance by selecting the best set of parameters.
We learned the importance of search space definition, and now know how to set up our own tuner and kick off the tuning job.
Lastly, I showed you how you can integrate Neptune into your project and keep track of the results of your trials.
I hope that this article was helpful, and you now know how to start using Keras Tuner, and what it can do for you. Thanks for reading!