Blog » Hyperparameter Optimization » How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps

How to Do Hyperparameter Tuning on Any Python Script in 3 Easy Steps

You wrote a Python script that trains and evaluates your machine learning model. Now, you would like to automatically tune hyperparameters to improve its performance?

I got you!

In this article, I will show you how to convert your script into an objective function that can be optimized with any hyperparameter optimization library.  

hyperparameter optimization

It will take just 3 steps and you will be tuning model parameters like there is no tomorrow.

Ready? 

Let’s go!

I suppose your main.py script looks something like this one:

import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split

data = pd.read_csv('data/train.csv', nrows=10000)
X = data.drop(['ID_code', 'target'], axis=1)
y = data['target']
(X_train, X_valid, 
y_train, y_valid )= train_test_split(X, y, test_size=0.2, random_state=1234)

train_data = lgb.Dataset(X_train, label=y_train)
valid_data = lgb.Dataset(X_valid, label=y_valid, reference=train_data)

params = {'objective': 'binary',
          'metric': 'auc',
          'learning_rate': 0.4,
          'max_depth': 15,
          'num_leaves': 20,
          'feature_fraction': 0.8,
          'subsample': 0.2}

model = lgb.train(params, train_data,
                  num_boost_round=300,
                  early_stopping_rounds=30,
                  valid_sets=[valid_data],
                  valid_names=['valid'])

score = model.best_score['valid']['auc']
print('validation AUC:', score)

Step 1: Decouple search parameters from code

Take the parameters that you want to tune and put them in a dictionary at the top of your script. By doing that you effectively decouple search parameters from the rest of the code.

import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split

SEARCH_PARAMS = {'learning_rate': 0.4,
                 'max_depth': 15,
                 'num_leaves': 20,
                 'feature_fraction': 0.8,
                 'subsample': 0.2}

data = pd.read_csv('../data/train.csv', nrows=10000)
X = data.drop(['ID_code', 'target'], axis=1)
y = data['target']
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1234)

train_data = lgb.Dataset(X_train, label=y_train)
valid_data = lgb.Dataset(X_valid, label=y_valid, reference=train_data)

params = {'objective': 'binary',
          'metric': 'auc',
          **SEARCH_PARAMS}

model = lgb.train(params, train_data,
                  num_boost_round=300,
                  early_stopping_rounds=30,
                  valid_sets=[valid_data],
                  valid_names=['valid'])

score = model.best_score['valid']['auc']
print('validation AUC:', score)

Step 2: Wrap training and evaluation into a function

Now, you can put the entire training and evaluation logic inside of a train_evaluate function. This function takes parameters as input and outputs the validation score. 

import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split

SEARCH_PARAMS = {'learning_rate': 0.4,
                 'max_depth': 15,
                 'num_leaves': 20,
                 'feature_fraction': 0.8,
                 'subsample': 0.2}


def train_evaluate(search_params):
    data = pd.read_csv('../data/train.csv', nrows=10000)
    X = data.drop(['ID_code', 'target'], axis=1)
    y = data['target']
    X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=1234)

    train_data = lgb.Dataset(X_train, label=y_train)
    valid_data = lgb.Dataset(X_valid, label=y_valid, reference=train_data)

    params = {'objective': 'binary',
              'metric': 'auc',
              **search_params}

    model = lgb.train(params, train_data,
                      num_boost_round=300,
                      early_stopping_rounds=30,
                      valid_sets=[valid_data],
                      valid_names=['valid'])

    score = model.best_score['valid']['auc']
    return score


if __name__ == '__main__':
    score = train_evaluate(SEARCH_PARAMS)
    print('validation AUC:', score)

Step 3: Run hypeparameter tuning script

We are almost there.

All you need to do now is to use this train_evaluate function as an objective for the black-box optimization library of your choice. 

I will use Scikit Optimize which I have described in great detail in another article but you can use any hyperparameter optimization library out there.


LEARN MORE
Explore our integration with Scikit-Optimize


In a nutshell I:

  • define the search SPACE,
  • create the objective function that will be minimized,
  • run the optimization via skopt.forest_minimize function.

In this example, I will try 100 different configurations starting with 10 randomly chosen parameter sets.

import skopt

from script_step2 import train_evaluate

SPACE = [
    skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'),
    skopt.space.Integer(1, 30, name='max_depth'),
    skopt.space.Integer(2, 100, name='num_leaves'),
    skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'),
    skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform')]


@skopt.utils.use_named_args(SPACE)
def objective(**params):
    return -1.0 * train_evaluate(params)


results = skopt.forest_minimize(objective, SPACE, n_calls=30, n_random_starts=10)
best_auc = -1.0 * results.fun
best_params = results.x

print('best result: ', best_auc)
print('best parameters: ', best_params)

This is it.

The results object contains information about the best score and parameters that produced it.

Note:

If you want to visualize your training and save diagnostic charts after it finishes you can add one callback and one function call to log every hyperparameter search to Neptune. Just use this helper function from neptune-contrib library.

import neptune
import neptunecontrib.monitoring.skopt as sk_utils
import skopt

from script_step2 import train_evaluate

neptune.init('jakub-czakon/blog-hpo')
neptune.create_experiment('hpo-on-any-script', upload_source_files=['*.py'])

SPACE = [
    skopt.space.Real(0.01, 0.5, name='learning_rate', prior='log-uniform'),
    skopt.space.Integer(1, 30, name='max_depth'),
    skopt.space.Integer(2, 100, name='num_leaves'),
    skopt.space.Real(0.1, 1.0, name='feature_fraction', prior='uniform'),
    skopt.space.Real(0.1, 1.0, name='subsample', prior='uniform')]


@skopt.utils.use_named_args(SPACE)
def objective(**params):
    return -1.0 * train_evaluate(params)


monitor = sk_utils.NeptuneMonitor()
results = skopt.forest_minimize(objective, SPACE, n_calls=100, n_random_starts=10, callback=[monitor])
sk_utils.log_results(results)

neptune.stop()

Now, when you run your parameter sweep you will see the following:

optuna monitoring

Check out the skopt hyperparameter sweep experiment with all the code, charts and results.


SEE ALSO
➡️ The Best Tools to Visualize Metrics and Hyperparameters of Machine Learning Experiments
➡️ Hyperparameter Tuning in Python: a Complete Guide 2020


Final thoughts

In this article, you’ve learned how to optimize hyperparameters of pretty much any Python script in just 3 steps. 

Hopefully, with this knowledge, you will build better machine learning models with less effort.

Happy training!


READ NEXT

How to Track Hyperparameters of Machine Learning Models?

Kamil Kaczmarek | Posted July 1, 2020

Machine learning algorithms are tunable by multiple gauges called hyperparameters. Recent deep learning models are tunable by tens of hyperparameters, that together with data augmentation parameters and training procedure parameters create quite complex space. In the reinforcement learning domain, you should also count environment params.

Data scientists should control hyperparameter space well in order to make progress.

Here, we will show you recent practicestips & tricks, and tools to track hyperparameters efficiently and with minimal overhead. You will find yourself in control of most complex deep learning experiments!

Why should I track my hyperparameters? a.k.a. Why is that important?

Almost every deep learning experimentation guideline, like this deep learning book, advises you on how to tune hyperparameters to make models work as expected. In the experiment-analyze-learn loop, data scientists must control what changes are being made, so that the “learn” part of the loop is working.

Oh, forgot to say that random seed is a hyperparameter as well (especially in the RL domain: check this Reddit for example).

What is current practice in the hyperparameters tracking?

Let’s review one-by-one common practices for managing hyperparameters. We focus on how to build, keep and pass hyperparameters to your ML scripts.

Continue reading ->

How to Track Hyperparameters of Machine Learning Models?

Read more
Hyperband and BOHB

HyperBand and BOHB: Understanding State of the Art Hyperparameter Optimization Algorithms

Read more
Optuna Hyperopt

Optuna vs Hyperopt: Which Hyperparameter Optimization Library Should You Choose?

Read more

Hyperparameter Tuning in Python: a Complete Guide 2021

Read more