How to Track Hyperparameters of Machine Learning Models?

Posted July 1, 2020

Machine learning algorithms are tunable by multiple gauges called hyperparameters. Recent deep learning models are tunable by tens of hyperparameters, that together with data augmentation parameters and training procedure parameters create quite complex space. In the reinforcement learning domain, you should also count environment params.

Data scientists should control hyperparameter space well in order to make progress.

Here, we will show you recent practices, tips & tricks, and tools to track hyperparameters efficiently and with minimal overhead. You will find yourself in control of most complex deep learning experiments!

Why should I track my hyperparameters? a.k.a. Why is that important?

Almost every deep learning experimentation guideline, like this deep learning book, advises you how to tune hyperparameters to make models work as expected. In the experiment-analyze-learn loop, data scientists must control what changes are being made, so that the “learn” part of the loop is working.

Oh, forgot to say that random seed is a hyperparameter as well (especially in the RL domain: check this reddit for example).

What is current practice in the hyperparameters tracking?

Let’s review one-by-one common practices for managing hyperparameters. We focus on how to build, keep and pass hyperparameters to your ML scripts.

Python dictionary

Very basic, very useful. Simply collect your hyperparameters in the Python dictionary, like in this simple example:

PARAMS = {'batch_size': 64,
          'n_epochs': 1000,
          'shuffle': True,
          'activation': 'elu',
          'dense_units': 128,
          'dropout': 0.2,
          'learning_rate': 0.001,
          'early_stopping': 20,
          'optimizer': 'Adam',
          }

Thanks to this approach you keep all hyperparameters in the single Python object and you can easily use it across your training scripts. In order to make sure that you track those parameters in the machine learning project, it’s recommended just to version control file where this dictionary is created.

You can check the entire example here.

Pros:

  1. Simple and straightforward because you already know the tool.
  2. Easy to make hierarchical structure with nested dictionaries.
  3. Almost no overhead in the code.
  4. Easy to merge multiple configuration files into a single dictionary.

Cons:

  1. Hyperparameters are part of the codebase, while they should be separate – remember to distinguish between the logic and its parametrization.
  2. Saving params to disk is not obvious.
  3. You may not notice that you overwrite some values. Then, it’s difficult to learn how a particular setup is performing, because you may overwrite some magic numbers.

Note: Did you know about AttrDict?

It is a Python library that allows you to access dictionary elements both as keys and as attributes. It’s really convenient to use attribute syntax.

Here is an example of the nested dicts:

config = {'neptune': {'project': 'kamil/analysis',
                      'tags': ['xgb-tune']},
          'booster': {'max_depth': 10,
                      'eta': 0.01,
                      'gamma': 0.001,
                      'silent': 1,
                      'subsample': 1,
                      'lambda': 1,
                      'alpha': 0.05,
                      'objective': 'reg:linear',
                      'verbosity': 0,
                      'eval_metric': 'rmse',
                      },
          'num_round': 20,
          }

You can access eta like this:

config['booster']['eta']

With attrdict, you can do it in more elegant way:

cfg = AttrDict(config)
cfg.booster.eta

Configuration file

They are regular text files with some predefined structure and standard libraries to parse them, like JSON encoder and decoder, or PyYAML. Common standards are json, yaml or cfg files.

Below is an example yaml file, that presents multiple hyperparameters for random forest along with more general info like project and experiment name.

project: ORGANIZATION/home-credit
name: home-credit-default-risk

parameters:
# Data preparation
  n_cv_splits: 5
  validation_size: 0.2
  stratified_cv: True
  shuffle: 1

# Random forest
  rf__n_estimators: 2000
  rf__criterion: gini
  rf__max_features: 0.2
  rf__max_depth: 40
  rf__min_samples_split: 50
  rf__min_samples_leaf: 20
  rf__max_leaf_nodes: 60
  rf__class_weight: balanced

# Post Processing
  aggregation_method: rank_mean

Similarly to the dictionary-based style, you just need to version control this file to keep track of hyperparameters.

You can read yaml file and access its elements by simply using yaml.load() like this:

import yaml

with open(config_path) as f:
    config = yaml.load(f, Loader=yaml.BaseLoader)  # config is dict

print(config['parameters']['n_cv_splits'])  # 5

As AttrDict was just introduced, let’s modify this snippet and access n_cv_splits value in more elegant way:

import yaml
from attrdict import AttrDict

with open(config_path) as f:
    config = yaml.load(f, Loader=yaml.BaseLoader)  # config is dict
    cfg = AttrDict(config)

print(cfg.parameters.n_cv_splits)  # 5

Here is an example of a large yaml file used for storing feature selection, model parameters and much more. Entire project is also publicly available.

Pros

  1. Everything is located in a single place.
  2. Easy to re-use saved configuration files.
  3. Nice separation of script logic and its parametrization.
  4. Enhanced readability of the code.

Cons

  1. It requires some programming discipline to put hyperparameters in the config file.
  2. If codebase changes rapidly (new features, new models and at the same time dropping older versions of the code), maintaining proper config files is an additional overhead.
  3. For large codebases, you may land with several config files, which can make things more complex and tedious to maintain.

Argparse

When experimenting, you usually go through multiple trials (or experiments) in order to understand relationships between hyperparameters and score, and to obtain the best performing model (we leave the discussion what it means that model performs well for another post).

In such a situation it comes in handy to start new experiments from the command line and specify values of parameters directly in the CLI. Argparse is a Python module that makes it easy to write user-friendly command-line interfaces.

I think that an easy way to understand argparse is to simply analyze an example. Below is a simple Python program that takes three optional positional arguments and prints them.

import argparse

parser = argparse.ArgumentParser(description='Process hyper-parameters')

parser.add_argument('--lr',       type=float, default=0.001, help='learning rate')
parser.add_argument('--dropout',  type=float, default=0.0,   help='dropout ratio')
parser.add_argument('--data_dir', type=str,   default='/neptune/is/the/best/data/', help='data directory for training')

args = parser.parse_args()

# Here is how to access passed values
print(args.lr)
print(args.dropout)
print(args.data_dir)

If you run this program, without any arguments, then defaults will be used:

python main.py

Output is:

0.001
0.0
/neptune/is/the/best/data/

If you specify parameters, then they are parsed, so that you can use them in your training script:

python main.py --lr 0.005 --dropout 0.5

Output is:

0.005
0.5
/neptune/is/the/best/data/

One important note about tracking: Be advised that argparse does not save or log parameters passed in the command line. Users have to save values of parameters themselves.

Pros

  1. Conveniently start new experiments.
  2. Decide on the hyperparameters’ values on the fly.
  3. Easy to add new arguments to argparse.

Cons

  1. Requires extra effort (not large though) to keep track of hyperparameters’ values across long experimentation-based projects. Argparse does not save values anywhere.
  2. Similarly to configuration files, if your project grows rapidly you may find it difficult to maintain CLI parameters.
  3. If you pass parameters in a few places in the code, it becomes not that obvious how to use argparse efficiently. Similar is true if you build/merge parameters from multiple places.

Note: Did you know about Click?

As mentioned in this post, there are few alternatives to argparse. One notable system is Click.

It is a Python package for creating CLI in a composable way with minimum additional coding. With “Click” you just decorate some functions like in this example, where the hello function is decorated:

import click

@click.command()
@click.option("--count", default=1, help="Number of greetings.")
@click.option("--name", prompt="Your name",
              help="The person to greet.")
def hello(count, name):
    """Simple program that greets NAME for a total of COUNT times."""
    for _ in range(count):
        click.echo("Hello, %s!" % name)

if __name__ == '__main__':
    hello()

Then run like any other CLI command:

python hello.py --count=3

Here is an example image segmentation project that uses click extensively. Take a look at the main.py and check it in detail.

Hydra

Hydra is a new project from Facebook AI that simplifies the configuration of more complex machine learning experiments.

The key ideas behind it are:

  • dynamically create a hierarchical configuration by composition,
  • override it when needed through the command line,
  • Pass new parameters (not present in the config) via CLI – they will be handled for you

Hydra gives you the ability to prepare and override complex configuration setups (including config groups and hierarchies), while keeping track of any overridden values.

Similarly to argparse, the best way to understand it (and how simple it is to work with hydra) is to analyze an example.

Let’s consider simplified config yaml file from the section about configuration files:

project: ORGANIZATION/home-credit
name: home-credit-default-risk

parameters:
# Data preparation
  n_cv_splits: 5
  validation_size: 0.2
  stratified_cv: True
  shuffle: 1

# Random forest
  rf__n_estimators: 2000
  rf__criterion: gini
  rf__max_depth: 40
  rf__class_weight: balanced

Here is minimalist-style hydra example:

import hydra
from omegaconf import DictConfig

@hydra.main(config_path='hydra-config.yaml')
def train(cfg):
    print(cfg.pretty())  # this prints config in a reader friendly way
    print(cfg.parameters.rf__n_estimators)  # this is how to access single value from the config


if __name__ == "__main__":
    train()

When you run it, you should see this:

name: home-credit-default-risk
parameters:
  n_cv_splits: 5
  rf__class_weight: balanced
  rf__criterion: gini
  rf__max_depth: 40
  rf__n_estimators: 2000
  shuffle: 1
  stratified_cv: true
  validation_size: 0.2
project: ORGANIZATION/home-credit

2000

What is convenient in hydra is that you can override any value in the config from the CLI like this:

python hydra-main.py parameters.n_cv_splits=12 \  parameters.stratified_cv=False name=entirely-new-name

As a result you have new values in the config:

name: entirely-new-name
parameters:
  n_cv_splits: 12
  rf__class_weight: balanced
  rf__criterion: gini
  rf__max_depth: 40
  rf__n_estimators: 2000
  shuffle: 1
  stratified_cv: false
  validation_size: 0.2
project: ORGANIZATION/home-credit

2000

Another feature that provides nice flexibility is an option to pass new, previously unseen parameters right from the command line.

To enable this feature simply turn off strict mode in hydra.

@hydra.main(config_path='config.yaml', strict=False)

In the command below I’m adding rf__max_features to the config and at the same time changing rf__n_estimators to 1500. Note that config is the same as in previous examples. In code we only turned off strict mode:

python hydra-main.py parameters.rf__n_estimators=1500 \  parameters.rf__max_features=0.2

Output changed accordingly:

name: home-credit-default-risk
parameters:
  n_cv_splits: 5
  rf__class_weight: balanced
  rf__criterion: gini
  rf__max_depth: 40
  rf__max_features: 0.2
  rf__n_estimators: 1500
  shuffle: 1
  stratified_cv: true
  validation_size: 0.2
project: ORGANIZATION/home-credit

1500

The hydra project is being actively developed, so make sure to check their tutorials from time to time to see new features.

Pros

  1. Composable configurations.
  2. Ability to override values very easily and still keep track of them.
  3. Config groups that bring organization to larger experiments.

Cons

  1. Hydra shines in larger experiments, measured as a number of hyperparameters and their hierarchy. For smaller ones, other methods will do just right.
  2. You need to be careful to avoid accidental override of important parameters’ value.
  3. In order to track hyperparameters across experiments you need to save the config object (cfg in examples above) manually.

How to use experiment tracking tools like Neptune to further increase efficiency and control?

One step further in managing hyperparameters is to use them in a broader context of experiment management. Here is an example of how neptune handles parametrization of ML experiments:

# define parameters
PARAMS = {'batch_size': 64,
          'n_epochs': 100,
          'shuffle': True,
          'activation': 'elu',
          'dense_units': 128,
          'dropout': 0.2,
          'learning_rate': 0.001,
          'early_stopping': 10,
          'optimizer': 'Adam',
          }

# create experiment
neptune.create_experiment(params=PARAMS)

# run training/validation code

In this way, each experiment has its own params setup saved to Neptune for further analysis and comparison across experiments. The main advantage of this approach is that you associate parameters with other experiment-related data/metadata like evaluation metrics or resulting models.

Parameters’ values are displayed for each experiment, allowing you to visually inspect and analyze multiple runs.

Experiment tracking tools – like Neptune – display params in multiple different places, so that you can:

  • Compare selected experiments in greater detail while having all different params highlighted (example).
  • Search for experiments with particular values of parameter at hand (example where we display only experiments with positive “timeseries_factor”).

How to visualize hyperparameters?

If you are a heavy experimenter you probably came across the need to efficiently compare hundreds of runs and visualize relationships between hyperparameters and score. One way to do it is to prepare a parallel coordinate plot, like the one below:

Parallel coordinates plot build with HiPlot.

Each vertical axis is one parameter, the score is the right-most (vertical) axis. Such visualization gives an immediate insight into the ranges of parameters that yields the best score. In principle, it should be interactive to allow users to explore the data freely and perform their own reasoning and interpretation.

Note:

Neptune is integrated with one great tool for building parallel coordinate pots: HiPlot, developed at Facebook AI Research (FAIR). Take a closer look here.

Such a large number of runs (as depicted above) usually comes from hyperparameter optimization jobs, or hpo in short. Python’s open source landscape has a lot to offer in that matter. Here is one comparison of two popular hpo libs: optuna and hyperopt.

scikit-optimize

Another approach to inspect and understand hpo results is proposed by the creators of scikit-optimize. Each hpo job produces diagnostics charts that visualize relationships between hyperparameters and score.

Here is an example:

Skopt visualization from example optimization job.

optuna

Optuna is a hyperparameter optimization framework to automate hyperparameter search. It offers its own suite of visualizations of the hyperparameters optimized for a given job.

Let’s study one example of the optuna hpo job:

Optuna diagnostics chart.

Similarly to the previous example, a major goal of visualization is to help understand how hyperparameters relate to the score that is being optimized.

If it sounds relevant to you, take a closer look at neptune-optuna integration, here in the docs.

Final thoughts

Hyperparameters are the central piece of the larger picture, which is experiment management. In this post, we showed the recent state of the practice in hyperparameters tracking. With Neptune, you can bring it one level up, by making them easily accessible, comparable, and shareable in the team instantaneously.

Kamil Kaczmarek
AI Research Advocate