MLOps Blog

How to Manage, Track, and Visualize Hyperparameters of Machine Learning Models?

8 min
Kamil Kaczmarek
11th May, 2023

Machine learning algorithms are tunable by multiple gauges called hyperparameters. Recent deep learning models are tunable by tens of hyperparameters, that together with data augmentation parameters and training procedure parameters create quite complex space. In the reinforcement learning domain, you should also count environment params.

Data scientists should control hyperparameter space well in order to make progress.

Here, we will show you recent practices, tips & tricks, examples, and tools to manage, track, and visualize hyperparameters efficiently and with minimal overhead. You will find yourself in control of the most complex deep learning experiments!

Learn more

The Best Tools to Visualize Metrics and Hyperparameters of Machine Learning Experiments

Hyperparameter Tuning in Python: a Complete Guide

Visualizing Machine Learning Models: Guide and Tools

 

Why should I track my hyperparameters? a.k.a. Why is that important?

Almost every deep learning experimentation guideline, like this deep learning book, advises you on how to tune hyperparameters to make models work as expected. In the experiment-analyze-learn loop, data scientists must control what changes are being made, so that the “learn” part of the loop is working.

Oh, forgot to say that random seed is a hyperparameter as well (especially in the RL domain: check this Reddit for example).

What is the current practice in hyperparameter management and tracking?

Let’s review one-by-one common practices for managing hyperparameters. We focus on how to build, keep and pass hyperparameters to your ML scripts.

Python dictionary

Very basic, very useful. Simply collect your hyperparameters in the Python dictionary, like in this simple example:

PARAMS = {'epoch_nr': 5,
          'batch_size': 64,
          'dense': 256,
          'optimizer': 'sgd',
          'metrics': ['accuracy', 'binary_accuracy'],
          'activation': 'elu'}

Thanks to this approach you keep all hyperparameters in a single Python object and you can easily use it across your training scripts. In order to make sure that you track those parameters in the machine learning project, it’s recommended just to version control file where this dictionary is created.

You can check the entire example here.

Pros

  1. Simple and straightforward because you already know the tool.
  2. Easy to make a hierarchical structure with nested dictionaries.
  3. Almost no overhead in the code.
  4. Easy to merge multiple configuration files into a single dictionary.
  5. Can be saved in a pickle file for future use.

Cons

  1. Hyperparameters are part of the codebase, while they should be separate – remember to distinguish between the logic and its parametrization.
  2. Saving params to disk is not obvious.
  3. You may not notice that you overwrite some values. Then, it’s difficult to learn how a particular setup is performing, because you may overwrite some magic numbers.
  4. Saved pickle files wouldn’t be readable outside of code.

Configuration file

They are regular text files with some predefined structure and standard libraries to parse them, like JSON encoder and decoder, or PyYAML. Common standards are JSON, yaml, or cfg files.

Below is an example yaml file, that presents multiple hyperparameters for random forest along with more general info like project and experiment name.

project: ORGANIZATION/home-credit
name: home-credit-default-risk

parameters:
# Data preparation
  n_cv_splits: 5
  validation_size: 0.2
  stratified_cv: True
  shuffle: 1

# Random forest
  rf__n_estimators: 2000
  rf__criterion: gini
  rf__max_features: 0.2
  rf__max_depth: 40
  rf__min_samples_split: 50
  rf__min_samples_leaf: 20
  rf__max_leaf_nodes: 60
  rf__class_weight: balanced

# Post Processing
  aggregation_method: rank_mean

Similarly to the dictionary-based style, you just need to version control this file to keep track of hyperparameters.

You can read the yaml file and access its elements by simply using yaml.load() like this:

import yaml

with open(config_path) as f:
    config = yaml.load(f, Loader=yaml.BaseLoader)  # config is dict

print(config['parameters']['n_cv_splits'])  # 5

As AttrDict was just introduced, let’s modify this snippet and access n_cv_splits value in a more elegant way:

import yaml
from attrdict import AttrDict

with open(config_path) as f:
    config = yaml.load(f, Loader=yaml.BaseLoader)  # config is dict
    cfg = AttrDict(config)

print(cfg.parameters.n_cv_splits)  # 5

Here is an example of a large yaml file used for storing feature selection, model parameters and much more.

Pros

  1. Everything is located in a single place.
  2. Easy to re-use saved configuration files.
  3. Nice separation of script logic and its parametrization.
  4. Enhanced readability of the code.

Cons

  1. It requires some programming discipline to put hyperparameters in the config file.
  2. If the codebase changes rapidly (new features, new models, and at the same time dropping older versions of the code), maintaining proper config files is an additional overhead.
  3. For large codebases, you may land with several config files, which can make things more complex and tedious to maintain.

YAML files are a standard format for writing configuration while deploying in AWS and other cloud platforms. So it will be all worth it to get your hands dirty with YAML, as the seed sown will reap its benefits in deployment.

Argparse

When experimenting, you usually go through multiple trials (or experiments) in order to understand the relationships between hyperparameters and score and to obtain the best-performing model (we leave the discussion on what it means that the model performs well for another post).

Might interest you

Performance Metrics in Machine Learning [Complete Guide]

In such a situation it comes in handy to start new experiments from the command line and specify values of parameters directly in the CLI. Argparse is a Python module that makes it easy to write user-friendly command-line interfaces.

I think that an easy way to understand argparse is to simply analyze an example. Below is a simple Python program that takes three optional positional arguments and prints them.

import argparse

parser = argparse.ArgumentParser(description='Process hyper-parameters')

parser.add_argument('--lr',       type=float, default=0.001, help='learning rate')
parser.add_argument('--dropout',  type=float, default=0.0,   help='dropout ratio')
parser.add_argument('--data_dir', type=str,   default='/neptune/is/the/best/data/', help='data directory for training')

args = parser.parse_args()

# Here is how to access passed values
print(args.lr)
print(args.dropout)
print(args.data_dir)

If you run this program, without any arguments, then defaults will be used:

python main.py

Output is:

0.001
0.0
/neptune/is/the/best/data/

If you specify parameters, then they are parsed, so that you can use them in your training script:

python main.py --lr 0.005 --dropout 0.5

Output is:

0.005
0.5
/neptune/is/the/best/data/

One important note about tracking: Be advised that argparse does not save or log parameters passed in the command line. Users have to save the values of parameters themselves.

Pros

  1. Conveniently start new experiments.
  2. Decide on the hyperparameters’ values on the fly.
  3. Easy to add new arguments to argparse.

Cons

  1. Requires extra effort (not large though) to keep track of hyperparameters’ values across long experimentation-based projects. Argparse does not save values anywhere.
  2. Similarly to configuration files, if your project grows rapidly you may find it difficult to maintain CLI parameters.
  3. If you pass parameters in a few places in the code, it becomes not that obvious how to use argparse efficiently. Similar is true if you build/merge parameters from multiple places.

Note: Did you know about Click?

As mentioned in this post, there are few alternatives to argparse. One notable system is Click.

It is a Python package for creating CLI in a composable way with minimum additional coding. With “Click” you just decorate some functions like in this example, where the hello function is decorated:

import click//r//n//r//n@click.command()
@click.option("--count", default=1, help="Number of greetings.")
@click.option("--name", prompt="Your name",
              help="The person to greet.")
def hello(count, name):
    """Simple program that greets NAME for a total of COUNT times."""
    for _ in range(count):rn        click.echo("Hello, %s!" % name)//r//n//r//nif __name__ == '__main__':rn    hello()

Then run like any other CLI command:

python hello.py --count=3

Hydra

Hydra is a new project from Facebook AI that simplifies the configuration of more complex machine learning experiments.

The key ideas behind it are:

  • Dynamically create a hierarchical configuration by composition,
  • Override it when needed through the command line,
  • Pass new parameters (not present in the config) via CLI – they will be handled for you

Hydra gives you the ability to prepare and override complex configuration setups (including config groups and hierarchies) while keeping track of any overridden values.

Similarly to argparse, the best way to understand it (and how simple it is to work with hydra) is to analyze an example.

Let’s consider simplified config yaml file from the section about configuration files:

project: ORGANIZATION/home-credit
name: home-credit-default-risk

parameters:
# Data preparation
  n_cv_splits: 5
  validation_size: 0.2
  stratified_cv: True
  shuffle: 1

# Random forest
  rf__n_estimators: 2000
  rf__criterion: gini
  rf__max_depth: 40
  rf__class_weight: balanced

Here is minimalist-style hydra example:

import hydra
from omegaconf import DictConfig

@hydra.main(config_path='hydra-config.yaml')
def train(cfg):
    print(cfg.pretty())  # this prints config in a reader friendly way
    print(cfg.parameters.rf__n_estimators)  # this is how to access single value from the config
if __name__ == "__main__":
    train()

When you run it, you should see this:

name: home-credit-default-risk
parameters:
  n_cv_splits: 5
  rf__class_weight: balanced
  rf__criterion: gini
  rf__max_depth: 40
  rf__n_estimators: 2000
  shuffle: 1
  stratified_cv: true
  validation_size: 0.2
project: ORGANIZATION/home-credit

2000

What is convenient in hydra is that you can override any value in the config from the CLI like this:

python hydra-main.py parameters.n_cv_splits=12   parameters.stratified_cv=False name=entirely-new-name

As a result you have new values in the config:

name: entirely-new-name
parameters:
  n_cv_splits: 12
  rf__class_weight: balanced
  rf__criterion: gini
  rf__max_depth: 40
  rf__n_estimators: 2000
  shuffle: 1
  stratified_cv: false
  validation_size: 0.2
project: ORGANIZATION/home-credit

2000

Another feature that provides nice flexibility is an option to pass new, previously unseen parameters right from the command line.

To enable this feature simply turn off strict mode in hydra.

@hydra.main(config_path='config.yaml', strict=False)

In the command below I’m adding rf__max_features to the config and at the same time changing rf__n_estimators to 1500. Note that config is the same as in previous examples. In code we only turned off strict mode:

python hydra-main.py parameters.rf__n_estimators=1500   parameters.rf__max_features=0.2

Output changed accordingly:

name: home-credit-default-risk
parameters:
  n_cv_splits: 5
  rf__class_weight: balanced
  rf__criterion: gini
  rf__max_depth: 40
  rf__max_features: 0.2
  rf__n_estimators: 1500
  shuffle: 1
  stratified_cv: true
  validation_size: 0.2
project: ORGANIZATION/home-credit

1500

The hydra project is being actively developed, so make sure to check their tutorials from time to time to see new features.

Pros

  1. Composable configurations.
  2. Ability to override values very easily and still keep track of them.
  3. Config groups that bring organization to larger experiments.

Cons

  1. Hydra shines in larger experiments, measured as a number of hyperparameters and their hierarchy. For smaller ones, other methods will do just right.
  2. You need to be careful to avoid accidental override of important parameters’ values.
  3. In order to track hyperparameters across experiments, you need to save the config object (cfg in examples above) manually.

PyTorchLightning LightningModule Hyperparamters

Note: This method would only work if you’re using PyTorchLightning as a framework of choice

PyTorch Lightning has an implicit way of tracking hyperparameters for you in checkpoints and YAML (same as we talked about above). This enhances reproducibility and makes the tracking process clean and efficient in the code. There’s no need to import a module or load any file in the codebase. 

All the heavy lifting is done by the LightningModule and the user has to just inherit methods directly. Let’s see an example in action:

class LitMNIST(LightningModule):
    def __init__(self, layer_1_dim=128, learning_rate=1e-2):
        super().__init__()
        # call this to save (layer_1_dim=128, learning_rate=1e-4) to the checkpoint
        self.save_hyperparameters()

        # equivalent
        self.save_hyperparameters("layer_1_dim", "learning_rate")

        # Now possible to access layer_1_dim from hparams
        self.hparams.layer_1_dim

The save_hyperparameters() method will save all the hyperparameters that are present in the object into a YAML file. Head over to the documentation to learn more about saving and loading hyperparameters just like a model checkpoint file.

Pros:

  1. No extra dependency on any module or file loading
  2. Fast and efficient save and load functions
  3. Being implicit in nature keeps the codebase clean and explainable

Cons:

  1. These methods are only limited to the PyTorch Lightning module
  2. Unfortunately, it’s plagued by the YAML file cons too.
    • It requires knowledge of code and programming to handle things well
    • Maintaining the config file is an additional overload if the configurations are changing rapidly. This can, however, be fixed by using version control.
  3. This technique offers less flexibility. For instance, if you want to save different hyperparameters in different config files then you’ve to go back to creating them explicitly.

Tensorflow 2 Tensorboard HParams

Note: This method would only work if you’re using Tensorflow as a framework of choice

For hyperparameter tuning, the best practice is to have those hyperparameters in a single place as the results to make a good decision. The HParams dashboard in TensorBoard provides several tools to help with this process of identifying the best experiment or most promising sets of hyperparameters.

Logging hyperparameters using HParams class would let you visualize them in tensorboard along with a plethora of other handy information. Once you log and save hyperparameters using TF summary writer in a Tensoroard compatible format, you can import and reuse those again in different experiments. Let’s see an example of how to log and utilize it while fitting a model.

import tensorflow as tf
from tensorboard.plugins.hparams import api as hp

HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))

METRIC_ACCURACY = 'accuracy'

with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
  hp.hparams_config(
    hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
    metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
  )

You can declare hyperparameters you wish using the hp.HParam method and save using tf.summary.create_file_writer which will save them into a format, tensorboard can read.

def train_test_model(hparams):
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
    tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax),
  ])
  model.compile(
      optimizer=hparams[HP_OPTIMIZER],
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'],
  )

  model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
  _, accuracy = model.evaluate(x_test, y_test)
  return accuracy

You can utilize the hyperparameters declared above in HParams directly in models. Head over to the documentation to learn more about it.

Pros:

  1. No extra dependency on any module or file loading
  2. Fast and easy visualization process
  3. Being implicit in nature keeps the codebase clean and explainable
  4. The directory structure will remain clean and organized

Cons:

  1. These methods are only limited to the TensorBoard module
  2. It requires knowledge of code and programming to handle things well
  3. Maintaining the config file is an additional overload if the configurations are changing rapidly. This can, however, be fixed by using version control.
  4. This technique also offers less flexibility. For instance, if you want to save different hyperparameters in different config files then you’ve to go back to creating them explicitly.

How to use experiment tracking tools like Neptune to further increase efficiency and control?

One step further in managing hyperparameters is to use them in a broader context of experiment management. Here is an example of how Neptune handles parametrization of ML experiments:

# define parameters
PARAMS = {'batch_size': 64,
          'n_epochs': 100,
          'shuffle': True,
          'activation': 'elu',
          'dense_units': 128,
          'dropout': 0.2,
          'learning_rate': 0.001,
          'early_stopping': 10,
          'optimizer': 'Adam',
          }

# create experiment
run = neptune.init_run(params=PARAMS)

# run training/validation code

In this way, each experiment has its own params setup saved to Neptune for further analysis and comparison across experiments. The main advantage of this approach is that you associate parameters with other experiment-related data/metadata like evaluation metrics or resulting models.

Head over to the documentation to find out how you can seamlessly integrate Neptune without changing your code much.

Parameters’ values are displayed for each experiment, allowing you to visually inspect and analyze runs
Parameters compare in Neptune
You can also compare multiple runs

Experiment tracking tools – like Neptune – display params in multiple different places, so that you can:

  • Compare selected experiments in greater detail while having all different params highlighted (example).
  • Search for experiments with particular values of the parameter at hand (example where we display only experiments with positive “timeseries_factor”).

Read also

Switching From Spreadsheets to Neptune.ai and How It Pushed My Model Building Process to the Next Level

How to visualize hyperparameters?

If you are a heavy experimenter you probably came across the need to efficiently compare hundreds of runs and visualize relationships between hyperparameters and score.

Parallel coordinate plot

One way to do it is to prepare a parallel coordinate plot, like the one below:

Parallel coordinates plot build with HiPlot.

Each vertical axis is one parameter, the score is the right-most (vertical) axis. Such visualization gives an immediate insight into the ranges of parameters that yields the best score. In principle, it should be interactive to allow users to explore the data freely and perform their own reasoning and interpretation.

One great tool for building parallel coordinate pots: HiPlot, developed at Facebook AI Research (FAIR). Take a closer look at how you can compare runs using parallel coordinates in Neptune.

By the way, such a large number of runs (as depicted above) usually comes from hyperparameter optimization jobs, or hpo in short. Python’s open source landscape has a lot to offer in that matter. Check this comparison of two popular hpo libs: optuna and hyperopt.

scikit-optimize

Another approach to inspect and understand hpo results is proposed by the creators of scikit-optimize. Each hpo job produces diagnostics charts that visualize relationships between hyperparameters and score.

Here is an example:

Skopt visualization from example optimization job.

Optuna

Optuna is a hyperparameter optimization framework to automate hyperparameter search. It offers its own suite of visualizations of the hyperparameters optimized for a given job.

Let’s study one example of the optuna hpo job:

Optuna diagnostics chart

Similarly to the previous example, a major goal of visualization is to help understand how hyperparameters relate to the score that is being optimized.

If it sounds relevant to you, take a closer look at neptune-optuna integration in the docs.

Final thoughts

Hyperparameters are the central piece of the larger picture, which is experiment management. In this post, we showed the recent state of the practice in hyperparameters tracking. With Neptune, you can bring it one level up, by making them easily accessible, comparable, and shareable in the team instantaneously.