How to Manage, Track, and Visualize Hyperparameters of Machine Learning Models?
Machine learning algorithms are tunable by multiple gauges called hyperparameters. Recent deep learning models are tunable by tens of hyperparameters, that together with data augmentation parameters and training procedure parameters create quite complex space. In the reinforcement learning domain, you should also count environment params.
Data scientists should control hyperparameter space well in order to make progress.
Here, we will show you recent practices, tips & tricks, examples, and tools to manage, track, and visualize hyperparameters efficiently and with minimal overhead. You will find yourself in control of the most complex deep learning experiments!
Learn more
️ The Best Tools to Visualize Metrics and Hyperparameters of Machine Learning Experiments
Why should I track my hyperparameters? a.k.a. Why is that important?
Almost every deep learning experimentation guideline, like this deep learning book, advises you on how to tune hyperparameters to make models work as expected. In the experiment-analyze-learn loop, data scientists must control what changes are being made, so that the “learn” part of the loop is working.
Oh, forgot to say that random seed is a hyperparameter as well (especially in the RL domain: check this Reddit for example).
What is the current practice in hyperparameter management and tracking?
Let’s review one-by-one common practices for managing hyperparameters. We focus on how to build, keep and pass hyperparameters to your ML scripts.
Python dictionary
Very basic, very useful. Simply collect your hyperparameters in the Python dictionary, like in this simple example:
PARAMS = {'epoch_nr': 5,
'batch_size': 64,
'dense': 256,
'optimizer': 'sgd',
'metrics': ['accuracy', 'binary_accuracy'],
'activation': 'elu'}
Thanks to this approach you keep all hyperparameters in a single Python object and you can easily use it across your training scripts. In order to make sure that you track those parameters in the machine learning project, it’s recommended just to version control file where this dictionary is created.
You can check the entire example here.
Pros
- Simple and straightforward because you already know the tool.
- Easy to make a hierarchical structure with nested dictionaries.
- Almost no overhead in the code.
- Easy to merge multiple configuration files into a single dictionary.
- Can be saved in a pickle file for future use.
Cons
- Hyperparameters are part of the codebase, while they should be separate – remember to distinguish between the logic and its parametrization.
- Saving params to disk is not obvious.
- You may not notice that you overwrite some values. Then, it’s difficult to learn how a particular setup is performing, because you may overwrite some magic numbers.
- Saved pickle files wouldn’t be readable outside of code.
Configuration file
They are regular text files with some predefined structure and standard libraries to parse them, like JSON encoder and decoder, or PyYAML. Common standards are JSON, yaml, or cfg files.
Below is an example yaml file, that presents multiple hyperparameters for random forest along with more general info like project and experiment name.
project: ORGANIZATION/home-credit
name: home-credit-default-risk
parameters:
# Data preparation
n_cv_splits: 5
validation_size: 0.2
stratified_cv: True
shuffle: 1
# Random forest
rf__n_estimators: 2000
rf__criterion: gini
rf__max_features: 0.2
rf__max_depth: 40
rf__min_samples_split: 50
rf__min_samples_leaf: 20
rf__max_leaf_nodes: 60
rf__class_weight: balanced
# Post Processing
aggregation_method: rank_mean
Similarly to the dictionary-based style, you just need to version control this file to keep track of hyperparameters.
You can read the yaml file and access its elements by simply using yaml.load()
like this:
import yaml
with open(config_path) as f:
config = yaml.load(f, Loader=yaml.BaseLoader) # config is dict
print(config['parameters']['n_cv_splits']) # 5
As AttrDict was just introduced, let’s modify this snippet and access n_cv_splits
value in a more elegant way:
import yaml
from attrdict import AttrDict
with open(config_path) as f:
config = yaml.load(f, Loader=yaml.BaseLoader) # config is dict
cfg = AttrDict(config)
print(cfg.parameters.n_cv_splits) # 5
Here is an example of a large yaml file used for storing feature selection, model parameters and much more.
Pros
- Everything is located in a single place.
- Easy to re-use saved configuration files.
- Nice separation of script logic and its parametrization.
- Enhanced readability of the code.
Cons
- It requires some programming discipline to put hyperparameters in the config file.
- If the codebase changes rapidly (new features, new models, and at the same time dropping older versions of the code), maintaining proper config files is an additional overhead.
- For large codebases, you may land with several config files, which can make things more complex and tedious to maintain.
YAML files are a standard format for writing configuration while deploying in AWS and other cloud platforms. So it will be all worth it to get your hands dirty with YAML, as the seed sown will reap its benefits in deployment.
Argparse
When experimenting, you usually go through multiple trials (or experiments) in order to understand the relationships between hyperparameters and score and to obtain the best-performing model (we leave the discussion on what it means that the model performs well for another post).
Might interest you
In such a situation it comes in handy to start new experiments from the command line and specify values of parameters directly in the CLI. Argparse is a Python module that makes it easy to write user-friendly command-line interfaces.
I think that an easy way to understand argparse is to simply analyze an example. Below is a simple Python program that takes three optional positional arguments and prints them.
import argparse
parser = argparse.ArgumentParser(description='Process hyper-parameters')
parser.add_argument('--lr', type=float, default=0.001, help='learning rate')
parser.add_argument('--dropout', type=float, default=0.0, help='dropout ratio')
parser.add_argument('--data_dir', type=str, default='/neptune/is/the/best/data/', help='data directory for training')
args = parser.parse_args()
# Here is how to access passed values
print(args.lr)
print(args.dropout)
print(args.data_dir)
If you run this program, without any arguments, then defaults will be used:
python main.py
Output is:
0.001
0.0
/neptune/is/the/best/data/
If you specify parameters, then they are parsed, so that you can use them in your training script:
python main.py --lr 0.005 --dropout 0.5
Output is:
0.005
0.5
/neptune/is/the/best/data/
One important note about tracking: Be advised that argparse does not save or log parameters passed in the command line. Users have to save the values of parameters themselves.
Pros
- Conveniently start new experiments.
- Decide on the hyperparameters’ values on the fly.
- Easy to add new arguments to argparse.
Cons
- Requires extra effort (not large though) to keep track of hyperparameters’ values across long experimentation-based projects. Argparse does not save values anywhere.
- Similarly to configuration files, if your project grows rapidly you may find it difficult to maintain CLI parameters.
- If you pass parameters in a few places in the code, it becomes not that obvious how to use argparse efficiently. Similar is true if you build/merge parameters from multiple places.
Note: Did you know about Click?
It is a Python package for creating CLI in a composable way with minimum additional coding. With “Click” you just decorate some functions like in this example, where the hello function is decorated:
import click//r//n//r//n@click.command()
@click.option("--count", default=1, help="Number of greetings.")
@click.option("--name", prompt="Your name",
help="The person to greet.")
def hello(count, name):
"""Simple program that greets NAME for a total of COUNT times."""
for _ in range(count):rn click.echo("Hello, %s!" % name)//r//n//r//nif __name__ == '__main__':rn hello()
Then run like any other CLI command:
python hello.py --count=3
Hydra
Hydra is a new project from Facebook AI that simplifies the configuration of more complex machine learning experiments.
The key ideas behind it are:
- Dynamically create a hierarchical configuration by composition,
- Override it when needed through the command line,
- Pass new parameters (not present in the config) via CLI – they will be handled for you
Hydra gives you the ability to prepare and override complex configuration setups (including config groups and hierarchies) while keeping track of any overridden values.
Similarly to argparse, the best way to understand it (and how simple it is to work with hydra) is to analyze an example.
Let’s consider simplified config yaml
file from the section about configuration files:
project: ORGANIZATION/home-credit
name: home-credit-default-risk
parameters:
# Data preparation
n_cv_splits: 5
validation_size: 0.2
stratified_cv: True
shuffle: 1
# Random forest
rf__n_estimators: 2000
rf__criterion: gini
rf__max_depth: 40
rf__class_weight: balanced
Here is minimalist-style hydra example:
import hydra
from omegaconf import DictConfig
@hydra.main(config_path='hydra-config.yaml')
def train(cfg):
print(cfg.pretty()) # this prints config in a reader friendly way
print(cfg.parameters.rf__n_estimators) # this is how to access single value from the config
if __name__ == "__main__":
train()
When you run it, you should see this:
name: home-credit-default-risk
parameters:
n_cv_splits: 5
rf__class_weight: balanced
rf__criterion: gini
rf__max_depth: 40
rf__n_estimators: 2000
shuffle: 1
stratified_cv: true
validation_size: 0.2
project: ORGANIZATION/home-credit
2000
What is convenient in hydra is that you can override any value in the config from the CLI like this:
python hydra-main.py parameters.n_cv_splits=12 parameters.stratified_cv=False name=entirely-new-name
As a result you have new values in the config:
name: entirely-new-name
parameters:
n_cv_splits: 12
rf__class_weight: balanced
rf__criterion: gini
rf__max_depth: 40
rf__n_estimators: 2000
shuffle: 1
stratified_cv: false
validation_size: 0.2
project: ORGANIZATION/home-credit
2000
Another feature that provides nice flexibility is an option to pass new, previously unseen parameters right from the command line.
To enable this feature simply turn off strict mode in hydra.
@hydra.main(config_path='config.yaml', strict=False)
In the command below I’m adding rf__max_features
to the config and at the same time changing rf__n_estimators
to 1500. Note that config is the same as in previous examples. In code we only turned off strict mode:
python hydra-main.py parameters.rf__n_estimators=1500 parameters.rf__max_features=0.2
Output changed accordingly:
name: home-credit-default-risk
parameters:
n_cv_splits: 5
rf__class_weight: balanced
rf__criterion: gini
rf__max_depth: 40
rf__max_features: 0.2
rf__n_estimators: 1500
shuffle: 1
stratified_cv: true
validation_size: 0.2
project: ORGANIZATION/home-credit
1500
The hydra project is being actively developed, so make sure to check their tutorials from time to time to see new features.
Pros
- Composable configurations.
- Ability to override values very easily and still keep track of them.
- Config groups that bring organization to larger experiments.
Cons
- Hydra shines in larger experiments, measured as a number of hyperparameters and their hierarchy. For smaller ones, other methods will do just right.
- You need to be careful to avoid accidental override of important parameters’ values.
- In order to track hyperparameters across experiments, you need to save the config object (
cfg
in examples above) manually.
PyTorchLightning LightningModule Hyperparamters
Note: This method would only work if you’re using PyTorchLightning as a framework of choice
PyTorch Lightning has an implicit way of tracking hyperparameters for you in checkpoints and YAML (same as we talked about above). This enhances reproducibility and makes the tracking process clean and efficient in the code. There’s no need to import a module or load any file in the codebase.
All the heavy lifting is done by the LightningModule and the user has to just inherit methods directly. Let’s see an example in action:
class LitMNIST(LightningModule):
def __init__(self, layer_1_dim=128, learning_rate=1e-2):
super().__init__()
# call this to save (layer_1_dim=128, learning_rate=1e-4) to the checkpoint
self.save_hyperparameters()
# equivalent
self.save_hyperparameters("layer_1_dim", "learning_rate")
# Now possible to access layer_1_dim from hparams
self.hparams.layer_1_dim
The save_hyperparameters() method will save all the hyperparameters that are present in the object into a YAML file. Head over to the documentation to learn more about saving and loading hyperparameters just like a model checkpoint file.
Pros:
- No extra dependency on any module or file loading
- Fast and efficient save and load functions
- Being implicit in nature keeps the codebase clean and explainable
Cons:
- These methods are only limited to the PyTorch Lightning module
- Unfortunately, it’s plagued by the YAML file cons too.
- It requires knowledge of code and programming to handle things well
- Maintaining the config file is an additional overload if the configurations are changing rapidly. This can, however, be fixed by using version control.
- This technique offers less flexibility. For instance, if you want to save different hyperparameters in different config files then you’ve to go back to creating them explicitly.
Tensorflow 2 Tensorboard HParams
Note: This method would only work if you’re using Tensorflow as a framework of choice
For hyperparameter tuning, the best practice is to have those hyperparameters in a single place as the results to make a good decision. The HParams dashboard in TensorBoard provides several tools to help with this process of identifying the best experiment or most promising sets of hyperparameters.
Logging hyperparameters using HParams class would let you visualize them in tensorboard along with a plethora of other handy information. Once you log and save hyperparameters using TF summary writer in a Tensoroard compatible format, you can import and reuse those again in different experiments. Let’s see an example of how to log and utilize it while fitting a model.
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
METRIC_ACCURACY = 'accuracy'
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
)
You can declare hyperparameters you wish using the hp.HParam method and save using tf.summary.create_file_writer which will save them into a format, tensorboard can read.
def train_test_model(hparams):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(10, activation=tf.nn.softmax),
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
_, accuracy = model.evaluate(x_test, y_test)
return accuracy
You can utilize the hyperparameters declared above in HParams directly in models. Head over to the documentation to learn more about it.
Pros:
- No extra dependency on any module or file loading
- Fast and easy visualization process
- Being implicit in nature keeps the codebase clean and explainable
- The directory structure will remain clean and organized
Cons:
- These methods are only limited to the TensorBoard module
- It requires knowledge of code and programming to handle things well
- Maintaining the config file is an additional overload if the configurations are changing rapidly. This can, however, be fixed by using version control.
- This technique also offers less flexibility. For instance, if you want to save different hyperparameters in different config files then you’ve to go back to creating them explicitly.
How to use experiment tracking tools like Neptune to further increase efficiency and control?
One step further in managing hyperparameters is to use them in a broader context of experiment management. Here is an example of how Neptune handles parametrization of ML experiments:
# define parameters
PARAMS = {'batch_size': 64,
'n_epochs': 100,
'shuffle': True,
'activation': 'elu',
'dense_units': 128,
'dropout': 0.2,
'learning_rate': 0.001,
'early_stopping': 10,
'optimizer': 'Adam',
}
# create experiment
run = neptune.init_run(params=PARAMS)
# run training/validation code
In this way, each experiment has its own params setup saved to Neptune for further analysis and comparison across experiments. The main advantage of this approach is that you associate parameters with other experiment-related data/metadata like evaluation metrics or resulting models.
Head over to the documentation to find out how you can seamlessly integrate Neptune without changing your code much.

Experiment tracking tools – like Neptune – display params in multiple different places, so that you can:
- Compare selected experiments in greater detail while having all different params highlighted (example).
- Search for experiments with particular values of the parameter at hand (example where we display only experiments with positive “timeseries_factor”).
Read also
How to visualize hyperparameters?
If you are a heavy experimenter you probably came across the need to efficiently compare hundreds of runs and visualize relationships between hyperparameters and score.
Parallel coordinate plot
One way to do it is to prepare a parallel coordinate plot, like the one below:
Each vertical axis is one parameter, the score is the right-most (vertical) axis. Such visualization gives an immediate insight into the ranges of parameters that yields the best score. In principle, it should be interactive to allow users to explore the data freely and perform their own reasoning and interpretation.
One great tool for building parallel coordinate pots: HiPlot, developed at Facebook AI Research (FAIR). Take a closer look at how you can compare runs using parallel coordinates in Neptune.
By the way, such a large number of runs (as depicted above) usually comes from hyperparameter optimization jobs, or hpo in short. Python’s open source landscape has a lot to offer in that matter. Check this comparison of two popular hpo libs: optuna and hyperopt.
scikit-optimize
Another approach to inspect and understand hpo results is proposed by the creators of scikit-optimize. Each hpo job produces diagnostics charts that visualize relationships between hyperparameters and score.
Here is an example:
Optuna
Optuna is a hyperparameter optimization framework to automate hyperparameter search. It offers its own suite of visualizations of the hyperparameters optimized for a given job.
Let’s study one example of the optuna hpo job:

Similarly to the previous example, a major goal of visualization is to help understand how hyperparameters relate to the score that is being optimized.
If it sounds relevant to you, take a closer look at neptune-optuna integration in the docs.
Final thoughts
Hyperparameters are the central piece of the larger picture, which is experiment management. In this post, we showed the recent state of the practice in hyperparameters tracking. With Neptune, you can bring it one level up, by making them easily accessible, comparable, and shareable in the team instantaneously.