There are four optimization algorithms to try.

**dummy_minimize**

You can run a simple random search over the parameters. Nothing fancy here but it is useful to have this option within the same API to compare if needed.

**forest_minimize and gbrt_minimize**

Both of those methods as well as the one in the next section are examples of Bayesian Hyperparameter Optimization also known as Sequential Model-Based Optimization SMBO. The idea behind this approach is to **estimate** the user-defined **objective function** **with** the random forest, extra trees, or gradient boosted trees **regressor**.

After each run of hyperparameters on the objective function, the algorithm makes an **educated guess which set of hyperparameters is most likely to improve the score** and should be tried in the next run. It is done by getting regressor predictions on many points (hyperparameter sets) and choosing the point that is the best guess based on the so-called acquisition function.

There are quite a few acquisition function options to choose from:

**EI and PI**: Negative expected improvement and Negative probability improvement. If you choose one of those you should tweak the *xi* parameter as well. Basically, when your algorithm is looking for the next set of hyperparameters, you can decide how small of the expected improvement you are willing to try on the actual objective function. The higher the value, the bigger the improvement (or probability of improvement) your regressor expects.*LCB*: Lower confidence bound. In this case, you want to choose your next point carefully, limiting the downside risk. You can decide how much risk you want to take at each run. By making the *kappa* parameter small you lean toward **exploitation** of what you know, by making it larger you lean toward **exploration** of the search space.

There are also options *EIPS* and *PIPS* which take into account both the score produced by the objective function and the execution time but I haven’t tried them

**gp_minimize**

Instead of using the tree regressors, the **objective function is approximated by the Gaussian process**.

From a user perspective, the added value of this method is that instead of deciding beforehand on one of the acquisition functions, you can let the algorithm select the best one of EI, PI, and LCB at every iteration. Just set the acquisition function to *gp_hedge* and try it out.

One more thing to consider is the **optimization method used at each iteration**, *sampling* or *lbfgs*. For both of them, the acquisition function is calculated over a randomly selected number of points (*n_points*) in the search space. If you go with sampling, then the point with the lowest value is selected. If you choose *lbfgs*, the algorithm will take some number (n_restarts_optimizer) of the best, randomly tried points, and will run the *lbfgs* optimization starting at each of them. So basically the *lbfgs* method is just an improvement over the sampling method if you don’t care about the execution time.