The performance of your machine learning model depends on your configuration. Finding an optimal configuration, both for the model and for the training algorithm, is a big challenge for every machine learning engineer.

Model configuration can be defined as a set of hyperparameters which influences model architecture. In case of deep learning, these can be things like number of layers, or types of activation functions. Training algorithm configuration, on the other hand, influences the speed and quality of the training process. You can think of learning rate value as a good example of parameters in a training configuration.

To select the right set of hyperparameters, we do hyperparameter tuning. Even though tuning might be time- and CPU-consuming, the end result pays off, unlocking the highest potential capacity for your model.

If, like me, you’re a deep learning engineer working with TensorFlow/Keras, then you should consider using **Keras Tuner. **It’s a great tool that helps with hyperparameter tuning in a smart and convenient way.

In this article, I’ll tell you how I like to implement Keras Tuner in deep learning projects.

## Why real projects matter

Everything that I’ll be doing is based on a real project. It’s not a toy problem, which is important to mention because you’ve probably seen other articles that aren’t based on real projects. Well, not this one!

Why is it so important to work with a project that reflects real life? It’s simple: these projects are much more complex at the core. Tools that might work well on a small synthetic problem, can perform poorly on real-life challenges. So, today I’ll show you **what real value you can expect from** **Keras Tuner**,** **and how to implement it in your own deep learning project.

## Project description & problem statement

We’ll be doing an image segmentation task. We’ll try to segment multiple objects of interest on an image of a paper scan. Our end-goal is to extract particular pieces of text using segmentation. The below paper scan is an example of what we’re going to work with:

We can approach this problem using recent advances in computer vision. For example, U-NET, first introduced by Ronneberger, Fischer and Brox in 2015 to segment medical images, is a deep learning neural net that we can employ for our purposes. U-NET’s output is a set of masks, where each mask contains a particular object of interest.

Basically, for an input image that contains some objects, our deep neural net, when trained, should segment all objects of our interest and return a set of masks; each mask corresponds to an object of a particular class. If you’re interested in U-NET and how it works, I highly recommend reading this research paper.

## Model & metrics selection

We won’t create a model from scratch. Since U-NET was introduced back in 2015, there are multiple implementations already available for us. Let’s take advantage of that.

I already imported the model, and I’m going to initialize an object of the model’s class. Let’s look at the documentation to see the variable parameters:

```
Signature:
unet(
input_size=(512, 512, 1),
start_neurons=64,
net_depth=4,
output_classes=1,
dropout=False,
bn_after_act=False,
activation='mish',
pretrained_weights=None,
)
Docstring:
Generates U-Net architecture model (refactored version)
Parameters:
input_size : tuple (h, w, ch) for the input dimentions
start_neurons : number of conv units in the first layer
net_depth : number of convolution cascades including the middle cascade
dropout : True -> use dropouts
bn_after_act : True -> BatchNormalizations is placed after activation layers. Before otherwise
activation : Type of activation layers: 'mish' - Mish-activation (default), 'elu' = ELU, 'lrelu' - LeakyReLU, 'relu' - ReLU
pretrained_weights - None or path to weights-file
Return:
U-net model
File: ~/ml_basis/synthetic_items/models/unet.py
Type: function
```

*Docstring for the U-NET class that shows a set of parameters for initialization*

As we can see from the docstring, there are eight parameters that define our future model. Only five parameters affect the model’s architecture. Three other parameters, `input_size, output_classes`

and `pretrained_weights`

, let us define size for an input image, number of output classes, and a path to weights from a previously pre-trained model respectively.

We’ll focus one the 5 parameters that sharpen a model’s architecture. This is where we’ll employ Keras Tuner to do hyperparameter tuning.

To find the best model architecture via hyperparameters tuning, we need to select a metric for model evaluation. To approach this question, let’s recup that U-NET performs binary classification for every image pixel, linking each pixel to a particular object class. Objects of interest in our problem domain are quite small compared to the image size. With that in mind we should think of a metric that best accounts for such imbalance in pixels classification. This is where F1 score does particularly well in terms of model evaluation.

## Keras Tuner implementation

### High-level overview of available tuners

How can we get the most out of our model using Keras Tuner? First of all, it’s important to say that there are multiple tuners in Keras. They use different algorithms for hyperparameter search. Here are the algorithms, with corresponding tuners in Keras:

`kerastuner.tuners.hyperband.Hyperband`

for the HyperBand-based algorithm;`kerastuner.tuners.bayesian.BayesianOptimization`

for the Gaussian process-based algorithm;`kerastuner.tuners.randomsearch.RandomSearch`

for the random search tuner.

To give you an initial intuition of these methods, I can say that `RandomSearch`

is the least efficient approach. It doesn’t learn from previously tested parameter combinations, and simply samples parameter combinations from a search space randomly.

`BayesianOptimization`

is similar to `RandomSearch`

in a way that they both sample a subset of hyperparameter combinations. The key difference is that `BayesianOptimization`

doesn’t sample hyperparameter combinations randomly, it follows a probabilistic approach under the hood. This approach takes into account already tested combinations and uses this information to sample the next combination for a test.

`Hyperband`

is an optimized version of `RandomSearch`

in terms of search time and, therefore, resources allocation.

If you’re a curious person and want to learn more about Random Search, Bayesian Optimization and HyperBand, I definitely recommend this article.

### Defining a search space and building a model

Keras tuner provides an elegant way to define a model and a search space for the parameters that the tuner will use – you do it all by creating a **model builder function**. To show you how easy and convenient it is, here’s how the model builder function for our project looks like:

```
# building a model using a model builder function
def model_builder(hp):
"""
Build model for hyperparameters tuning
hp: HyperParameters class instance
"""
# defining a set of hyperparametrs for tuning and a range of values for each
start_neurons = hp.Int(name = 'start_neurons', min_value = 16, max_value = 128, step = 16)
net_depth = hp.Int(name = 'net_depth', min_value = 2, max_value = 6)
dropout = hp.Boolean(name = 'dropout', default = False)
bn_after_act = hp.Boolean(name = 'bn_after_act', default = False)
activation = hp.Choice(name = 'activation', values = ['mish', 'elu', 'lrelu'], ordered = False)
input_size = (544,544,3)
target_labels = [str(i) for i in range(21)]
# building a model
model = u(input_size = input_size,
start_neurons = start_neurons,
net_depth = net_depth,
output_classes = len(target_labels),
dropout = dropout,
bn_after_act = bn_after_act,
activation = activation)
# model compilation
model.compile(optimizer = Adam(lr = 1e-3),
loss = weighted_cross_entropy,
metrics = [f1, precision, recall, iou])
return model
```

You might have noticed that within the model builder function, there are multiple methods that we used to define a search space for hyperparameters – `hp.Int, hp.Boolean`

and `hp.Choice`

.

There’s nothing wild in how these methods operate, just a straightforward definition for the values that can be used in the parameter search space. What really matters is how we, as engineers, select the methods for each parameter, and define optimal ranges/options for the values to be sampled.

For example, it’s very important to carefully consider the parameters within `hp.Int`

, giving it meaningful minimum and maximum values (`min_value`

and `max_value`

), and a proper `step`

. Your goal here is not to get overwhelmed with the number of options, which would cause the tuning process to take too much time and resources. It’s also crucial not to limit the search space in such a way that the tuner won’t even consider the best possible values. Given your expertise, knowledge and problem domain, think of what values might be the best to test out.

Besides `hp.Int, hp.Boolean`

and `hp.Choice`

, there are also a few other options available for us to define values in the search space. You can get familiar with these methods via the documentation.

As a last step in creating a model builder function, we `.compile`

our model before returning it.

### Tuner initizaliation

By now you should already have a defined model builder function, and an idea of what algorithm you’d like to use for hyperparameter tuning. If you’re all set with the function and the algorithm, then you’re ready to initiate a tuner object.

For the image segmentation project we’re working on, I decided to stick with the `Hyperband`

algorithm, so my initialization code looks like this:

```
# tuner initialization
tuner = kt.Hyperband(hypermodel = model_builder,
objective = kt.Objective("val_f1", direction="max"),
max_epochs = 20,
project_name='hyperband_tuner')
```

Four parameters are used during initialization:

`hypermodel`

is a model builder function we defined previously;`objective`

is a metric that our model is trying to improve (maximize or minimize). As you might note from the above code snippet, I explicitly specify the name of the metric function of my choice (val_f1 which stands for f1 score for the validation dataset) and the direction it must go (max);`max_epochs`

defines the total number of epochs used to train each model. Official documentation suggests to “set this to a value slightly higher than the expected time to convergence for your largest Model”;`project_name`

is a path to the folder where all tuning-related results will be placed and stored.

### Tuning process launch

Launching a hypertuning process is similar to fitting a model in Keras/TensorFlow, except for the fact that we use `.search`

method on a tuner object instead of regular `.fit`

. Here is how I kicked off the tuning job for the project:

```
tuner.search(training_data=train_dg,
steps_per_epoch=batches_per_epoch,
validation_data=valid_dg,
validation_steps=len(glob(img_dir + '/*')) / valid_batch_size,
epochs=50,
shuffle=True,
verbose=1,
initial_epoch=0,
callbacks=[ClearTrainingOutput()],
use_multiprocessing=True,
workers=6)
```

`.search`

method and all of the parameters used in there should be already familiar to you, the only thing I want to point out is `ClearTrainingOutput()`

callback, which essentially just clears out the output at the end of every training epoch. Here is the code for the `ClearTrainingOutput`

callback:

```
# defining a call that will clean out output at the end of every training epoch
class ClearTrainingOutput(tf.keras.callbacks.Callback):
def on_train_end(*args, **kwargs):
IPython.display.clear_output(wait = True)
```

### Getting tuning results

Here comes the most exciting part. Let’s see how well the tuner did. We’re curious what kind of results we were able to achieve, and which model configuration led to such performance.

To check the summary for the hypertuning job, we simply use `.results_summary()`

on a `tuner`

instance. Here is the complete code:

tuner.results_summary()

When run, the output for the tuner that we used within the project looks like this:

```
Results summary:
Results in hyperband_tuner
Showing 10 best trials
Objective(name='val_f1', direction='max')
Trial summary
Hyperparameters:
start_neurons: 32
net_depth: 5
dropout: False
bn_after_act: True
activation: mish
tuner/epochs: 15
tuner/initial_epoch: 0
tuner/bracket: 0
tuner/round: 0
Score: 0.9533569884300232
Trial summary
Hyperparameters:
start_neurons: 80
net_depth: 5
dropout: False
bn_after_act: True
activation: elu
tuner/epochs: 10
tuner/initial_epoch: 0
tuner/bracket: 1
tuner/round: 0
Score: 0.9258414387702942
Trial summary
Hyperparameters:
start_neurons: 128
net_depth: 3
dropout: True
bn_after_act: False
activation: elu
tuner/epochs: 18
tuner/initial_epoch: 3
tuner/bracket: 1
tuner/round: 1
tuner/trial_id: 5bc455f16fad434a9452c51a71c741b0
Score: 0.9170311570167542
..............
Trial summary
Hyperparameters:
start_neurons: 48
net_depth: 3
dropout: True
bn_after_act: False
activation: mish
tuner/epochs: 3
tuner/initial_epoch: 0
tuner/bracket: 1
tuner/round: 0
Score: 0.5333467721939087
Trial summary
Hyperparameters:
start_neurons: 96
net_depth: 3
dropout: True
bn_after_act: True
activation: mish
tuner/epochs: 3
tuner/initial_epoch: 0
tuner/bracket: 1
tuner/round: 0
Score: 0.5279057025909424
Trial summary
Hyperparameters:
start_neurons: 32
net_depth: 3
dropout: False
bn_after_act: False
activation: elu
tuner/epochs: 3
tuner/initial_epoch: 0
tuner/bracket: 1
tuner/round: 0
Score: 0.5064906477928162
```

In the above code snippet, I truncated the output for `.results_summary()`

to only 6 trials, showing the top 3 and the bottom 3 models. You might note that the worst model was only able to get up to 50 % for the metric of our choice (f1 score), whereas the best model led the performance to a remarkable 95.3 %.

### Results & business impact

The range between the top and worst performer is more than 45 %. From such a difference, we can conclude that:

- Keras Tuner did an incredible job finding the best set for model parameters, showing a twofold increase in metric growth;
- We, as engineers, defined proper search space to sample from;
- Keras Tuner works well not only for toy problems but, most importantly, for real-life projects.

Let me also share with the business impact that we got. Model that has been better optimized for a particular problem domain using hyperparameters tuning led our service to a more stable and accurate long-run performance. Compared to our other similar but non-optimized services, we’ve seen a 15 % increase in an acceptance rate for our service as well as 25 % growth in classification confidence level.

## Results tracking & sharing

Time for some MLOps. Great news: if you use Neptune, it now supports a complete integration with Keras Tuner.

Please note that due to the recent API update, this post needs some changes as well – we’re working on it! In the meantime, please check the Neptune documentation, where everything is up to date! 🥳

With the Neptune integration, you can:

- see charts of logged metrics for every trial,
- see the parameters tried at every trial,
- see hardware consumption during search,
- log the best parameters after training,
- log hyperparameter search space,
- log Keras Tuner project directory with information for all the trials

Pretty exciting! Let me show you how I integrated Neptune for tracking in my project, to store tuning results in the cloud.

I’ll skip the basic steps, like Neptune initialization and experiment creation, since it’s well described in the official documentation.

Instead, I’ll focus on the code changes required to employ Neptune into your project workflow. The only change is in the tuner initialization part. Here is how the initialization looks like with the changes made:

```
import neptunecontrib.monitoring.kerastuner as npt_utils
# tuner initialization
tuner = kt.Hyperband(hypermodel = model_builder,
objective = kt.Objective("val_f1", direction="max"),
max_epochs = 20,
project_name = 'hyperband_tuner',
logger = npt_utils.NeptuneLogger())
```

We added logger into the tuner, so we can log the following after every trial:

- run parameters under ‘hyperparameters/values’ text log;
- loss and all the metrics defined when compiling Keras model;
- hardware consumption with CPU, GPU and memory during search.

In addition to that, I also take advantage of the `.log_tuner_info()`

method to log more information from Keras Tuner objects to Neptune. Here’s how I do it:

npt_utils.log_tuner_info(tuner)

This will log pretty much everything that we want to keep track of:

- best score (‘best_score’ metric);
- best parameters (‘best_parameters’ property);
- score for every run (‘run_score’, metric);
- tuner project directory (‘TUNER_PROJECT_NAME’ artifact);
- parameter space (‘hyperparameters/space’ text log);
- name of the metric/loss used as objective (‘objective/name’ property);
- direction of the metric/loss used as objective (‘objective/direction’ property);
- tuner id (‘tuner_id’ property);
- best trial id (‘best_trial_id’ property).

Interested to know more? Watch this guided video, where an end-to-end integration is shown given an example of another project. Another great source of inspiration is the official documentation.

## Final remarks

In this article, we’ve gone over a complete Keras tuner implementation for a real life project, showing its potential to grow model performance by selecting the best set of parameters.

We learned the importance of search space definition, and now know how to set up our own tuner and kick off the tuning job.

Lastly, I showed you how you can integrate Neptune into your project and keep track of the results of your trials.

I hope that this article was helpful, and you now know how to start using Keras Tuner, and what it can do for you. Thanks for reading!

**READ NEXT**

## How to Track Hyperparameters of Machine Learning Models?

Kamil Kaczmarek | Posted July 1, 2020

**Machine learning algorithms are tunable by multiple gauges called hyperparameters**. Recent deep learning models are tunable by tens of hyperparameters, that together with data augmentation parameters and training procedure parameters create quite complex space. In the reinforcement learning domain, you should also count environment params.

Data scientists should **control** **hyperparameter** **space** well in order to **make** **progress**.

Here, we will show you **recent** **practices**, **tips & tricks,** and **tools** to track hyperparameters efficiently and with minimal overhead. You will find yourself in control of most complex deep learning experiments!

## Why should I track my hyperparameters? a.k.a. Why is that important?

Almost every deep learning experimentation guideline, like this deep learning book, advises you on how to tune hyperparameters to make models work as expected. In the **experiment-analyze-learn loop**, data scientists must control what changes are being made, so that the “learn” part of the loop is working.

Oh, forgot to say that **random seed is a hyperparameter** as well (especially in the RL domain: check this Reddit for example).

## What is current practice in the hyperparameters tracking?

Let’s review one-by-one common practices for managing hyperparameters. We focus on how to build, keep and pass hyperparameters to your ML scripts.

Continue reading ->