# Train PyTorch Models Using Genetic Algorithm with PyGAD

PyGAD is a genetic algorithm Python 3 library for solving optimization problems. One of these problems is training machine learning algorithms.

PyGAD has a module called pygad.kerasga. It trains Keras models using the genetic algorithm. On January 3rd, 2021, a new release of PyGAD 2.10.0 brought a new module called pygad.torchga to train PyTorch models. It’s very easy to use, but there are a few tricky steps.

So, in this tutorial, we’ll explore how to use PyGAD to train PyTorch models.

## Table of contents

Let’s get started.

## Install PyGAD

PyGAD is a Python 3 library, available at PyPI (Python Package Index). So, you can install it simply using this pip command:

`pip install pygad>=2.10.0`

Make sure you’re getting at least version 2.10.0, earlier ones don’t support the pygad.torchga module.

You can also download the wheel distribution file of PyGAD 2.10.0 from this link, and install it with the following command (make sure the current directory is set to the directory with the .whl file).

`pip install pygad-2.10.0-py3-none-any.whl`

After PyGAD is installed, it’s time to start with the pygad.torchga module.

To learn more about PyGAD, please read its documentation at Read the Docs. You can also access the documentation of the pygad.torchga module directly through this link.

## pygad.torchga module

PyGAD 2.10.0 lets us train PyTorch models using the genetic algorithm (GA). The problem of training a PyTorch model is formulated to the GA as an optimization problem, where all the parameters in the model (e.g. weights and biases) are represented as a single vector (i.e. chromosome).

The pygad.torchga module (torchga is short for Torch Genetic Algorithm) helps us formulate the PyTorch model training problem the way PyGAD expects it. The module has 1 class and 2 functions:

1. TorchGA: A class for creating a population of solutions (i.e. chromosomes) for the PyTorch model. Each solution/chromosome holds a set of all the parameters of the model.
2. model_weights_as_vector(): A function that accepts an argument called model representing the PyTorch model, and returns its parameters as a vector (i.e. chromosome).
3. model_weights_as_dict(): A function that accepts 2 arguments. The first one is called model, it accepts the PyTorch model. The second argument is called weights_vector, which is the vector representing all model parameters. This function returns a dictionary of the PyTorch model parameters, which is ready to be passed to the PyTorch method called load_state_dict() to set the model weights.

The source code of the pygad.torchga module is available at the ahmedfgad/TorchGA GitHub project.

The constructor of the TorchGA class accepts the following 2 arguments:

1. model: PyTorch model.
2. num_solutions: Number of solutions in the population. Each solution has a different set of parameters of the PyTorch model.

Each of these arguments is used as an attribute in the instances of the pygad.torchga.TorchGA class. This means you can access the model by using the model attribute as follows:

```torchga = TorchGA(model=---, num_solutions=---)
torchga.model```

There is a third attribute called population_weights, which is a 2D list of all solutions in the population. Remember that each solution is a 1D list holding the model’s parameters.

Here’s an example of creating an instance of the TorchGA class. The model argument can be assigned to any PyTorch model. The value passed to the num_solutions argument is 10, which means there are 10 solutions in the population.

```import pygad.torchga

torch_ga = pygad.torchga.TorchGA(model=...,
num_solutions=10)

initial_population = torch_ga.population_weights
```

The constructor of the TorchGA class calls a method called create_population() which creates and returns a population of solutions to the PyTorch model. At first, the model_weights_as_vector() function is called to return model parameters as a vector.

This vector is used to create solutions in the population. To make a difference between the solutions, random values are added to the vector.

Assuming that the model has 30 parameters, then the shape of the population_weights array is 10×30.

Now, let’s go over the steps needed to train a PyTorch model using PyGAD.

## Train PyTorch models using PyGAD

To train a PyTorch model using PyGAD, we need to go through these steps:

• Classification or Regression?
• Create a PyTorch Model
• Create an Instance of the pygad.torchga.TorchGA Class
• Prepare the Training Data
• Decide the Loss Function
• Build the Fitness Function
• Generation Callback Function (Optional)
• Create an Instance of the pygad.GA Class
• Run the Genetic Algorithm

We’ll discuss each step in detail.

### Classification or regression?

It’s important to decide whether the type of problem being solved by the PyTorch model is classification or regression. This will help us prepare:

1. The loss function of the model (which is used to build the fitness function),
2. The activation function in the output layer of the model,
3. The training data.

For the loss functions offered by PyTorch, check this link. Examples of loss functions for regression problems include mean absolute error (nn.L1Loss) and mean square error (nn.MSELoss).

For a classification problem, some examples are binary cross-entropy (nn.BCELoss) for binary classification and cross-entropy (nn.CrossEntropyLoss) for multi-class problems.

Based on whether the problem is classification or regression, we can decide the activation function in the output layer. For example, softmax is for classification, linear is for regression.

The training data also depends on the problem type. If the problem is classification, then the output comes from a set of finite discrete values. If the problem is regression, then the output comes from a set of infinite continuous values.

### Create a PyTorch model

We’ll do an example of building a PyTorch model, using the torch.nn module, to solve a simple regression problem. The model has 3 layers:

1. A Linear layer as the input layer with 3 inputs and 2 outputs,
2. A ReLU activation layer,
3. Another Linear layer as the output layer with 2 inputs and 1 output.

If the problem is classification, we must add an appropriate output layer, like SoftMax.

Finally, the model is created as an instance of the torch.nn.Sequential class, which accepts all the layers previously created in order.

```import torch.nn

input_layer = torch.nn.Linear(3, 2)
relu_layer = torch.nn.ReLU()
output_layer = torch.nn.Linear(2, 1)

model = torch.nn.Sequential(input_layer,
relu_layer,
output_layer)
```

We won’t go in-depth about how to build PyTorch models. For more details, you can check the PyTorch documentation.

Now, we’ll create an initial population of PyTorch model’s parameters using the pygad.torchga.TorchGA class.

### Create an instance of the pygad.torchga.TorchGA class

Using the TorchGA class, PyGAD offers a simple interface to create an initial population of solutions to the PyTorch model. Just create an instance of pygad.torchga.TorchGA class, and an initial population will be created automatically.

Here is an example that passes the previously created model to the constructor of the TorchGA class.

```import pygad.torchga

torch_ga = pygad.torchga.TorchGA(model=model,
num_solutions=10)```

Now let’s create random training data to train the model.

### Prepare the training data

Based on whether the problem is classification or regression, we prepare the training data accordingly.

Here are 5 random samples, where each sample has 3 inputs and 1 output.

```import numpy

# Data inputs
data_inputs = numpy.array([[0.02, 0.1, 0.15],
[0.7, 0.6, 0.8],
[1.5, 1.2, 1.7],
[3.2, 2.9, 3.1]])

# Data outputs
data_outputs = numpy.array([[0.1],
[0.6],
[1.3],
[2.5]])
```

If we’re solving a binary classification problem like XOR, then its data is given below, where there are 4 samples with 2 inputs and 1 output.

```import numpy

# XOR problem inputs
data_inputs = numpy.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])

# XOR problem outputs
data_outputs = numpy.array([[1, 0],
[0, 1],
[0, 1],
[1, 0]])
```

Time for the loss function for regression and classification problems.

### Decide the loss function

#### Regression

For a regression problem, loss functions include:

#### Classification

For a classification problem, the loss functions include:

Check this page for more information about loss functions in PyTorch.

Here’s an example of calculating binary cross-entropy using the torch.nn.BCELoss class. The detach() method is called to detach the tensor from the graph, in order to return its value. Check this link for more information about the detach() method.

```loss_function = torch.nn.BCELoss()

loss = loss_function(predictions, data_outputs).detach().numpy()```

The fitness function is then computed based on the calculated loss.

### Build the fitness function

The genetic algorithm expects the fitness function to be a maximization one, where the higher its output, the better the result. However, calculating the loss for machine learning models is based on a minimization loss function. The lower the loss, the better the result.

If the fitness is set equal to the loss, then the genetic algorithm will search in the direction that makes the fitness increase. Thus, it will go in the opposite direction that increases the loss. This is why the fitness is calculated as the inverse of the loss according to the next line.

The small value 0.00000001 is added to avoid dividing by zero when loss=0.0.

```fitness_value = (1.0 / (loss + 0.00000001))
```

When training PyTorch models using PyGAD, there are multiple solutions and each solution is a vector that holds all the parameters of the model.

To build the fitness function, follow these steps:

1. Restore model parameters from the 1D vector.
2. Set model parameters.
3. Make predictions.
4. Calculate loss value.
5. Calculate fitness value.
6. Return fitness value.

Next, we’ll build the fitness function for regression and binary classification problems.

#### Fitness function for regression

The fitness function in PyGAD is built as a regular Python function, but it must accept 2 arguments representing:

1. The solution to calculate its fitness value,
2. The index of the solution within the population.

The solution passed to the fitness function is a 1D vector. This vector can’t be used directly for the parameters of the PyTorch model, as the model expects parameters in the form of a dictionary. So, before calculating the loss, we need to convert the vector into a dictionary. We can use the model_weights_as_dict() function in the pygad.torchga module, as follows:

```model_weights_dict = torchga.model_weights_as_dict(model=model,
weights_vector=solution)```

Once the dictionary of parameters is created, then the load_state_dict() method is called to use the parameters in this dictionary as the current parameters of the model.

```model.load_state_dict(model_weights_dict)
```

According to the current parameters, the model makes predictions to the training data.

`predictions = model(data_inputs)`

The model’s predictions are passed to the loss function to calculate the solution’s loss. The mean absolute error is used as the loss function.

```loss_function = torch.nn.L1Loss()

solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001)
```

Finally, the fitness value is returned.

```loss_function = torch.nn.L1Loss()

def fitness_func(solution, sol_idx):
global data_inputs, data_outputs, torch_ga, model, loss_function

model_weights_dict = torchga.model_weights_as_dict(model=model,
weights_vector=solution)

# Use the current solution as the model parameters.
model.load_state_dict(model_weights_dict)

predictions = model(data_inputs)

solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001)

return solution_fitness
```

#### Fitness for binary classification

Here is the fitness function for a binary classification problem. The loss function used is binary cross-entropy.

```loss_function = torch.nn.BCELoss()

def fitness_func(solution, sol_idx):
global data_inputs, data_outputs, torch_ga, model, loss_function

model_weights_dict = torchga.model_weights_as_dict(model=model,
weights_vector=solution)

# Use the current solution as the model parameters.
model.load_state_dict(model_weights_dict)

predictions = model(data_inputs)

solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001)

return solution_fitness
```

The created fitness function should be assigned to the fitness_func argument in the pygad.GA class’s constructor.

Next, we’ll build a callback function executed at the end of each generation.

### Generation callback function (optional)

According to the PyGAD lifecycle shown in the figure below, there’s a callback function that’s called after each generation. This function could be implemented and used to print some debugging information, like the best fitness value in each generation, and the number of completed generations. Note that this step is optional and for debugging purposes only.

All you need to do is to implement the callback function, and then assign it to the on_generation argument in the constructor of the pygad.GA class. Here is the callback function which accepts a single argument representing the instance of the pygad.GA class.

Using this instance, the attribute generations_completed is returned, and it holds the number of completed generations. The best_solution() method is also called, it returns information about the best solution in the current generation.

```def callback_generation(ga_instance):
print("Generation = {generation}".format(generation=ga_instance.generations_completed))
print("Fitness    = {fitness}".format(fitness=ga_instance.best_solution()))
```

The next step is creating an instance of the pygad.GA class, responsible for running the genetic algorithm to train the PyTorch model.

### Create an instance of the pygad.GA class

The constructor of the pygad.GA class accepts many arguments that can be explored in the documentation. Using just some of those arguments, the next code creates an instance of the pygad.GA class and saves it in the ga_instance variable:

• num_generations: Number of generations.
• num_parents_mating: Number of parents to mate.
• initial_population: The initial population of PyTorch model’s parameters.
• fitness_func: The fitness function.
• on_generation: The generation callback function.
```num_generations = 250
num_parents_mating = 5
initial_population = torch_ga.population_weights

ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
initial_population=initial_population,
fitness_func=fitness_func,
on_generation=callback_generation)
```

Note that the number of solutions within the population was previously set to 10 in the constructor of the TorchGA class. Thus, the number of parents to mate must be less than 10.

In the next section, we call the run() method to run the genetic algorithm and train the PyTorch model.

### Run the genetic algorithm

The ga_instance of pygad.GA can now call the run() method to start the genetic algorithm.

```ga_instance.run()
```

After this method completes, we can make predictions using the best solution found by the genetic algorithm in the last generation.

There’s a useful method called plot_result() in the pygad.GA class, it shows a figure relating the fitness value to the generation number. It’s useful after the run() method completes.

`ga_instance.plot_result(title="PyGAD & PyTorch - Iteration vs. Fitness")`

## Statistics about the trained model

The pygad.GA class has a method called best_solution() which returns 3 outputs:

1. Best solution found,
2. Fitness value of the best solution,
3. The index of the best solution within the population.

The next code calls the best_solution() method and prints information about the best solution.

```solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
print("Index of the best solution : {solution_idx}".format(solution_idx=solution_idx))
```

The best solution’s parameters can be converted into a dictionary that’s fed into the PyTorch model for making predictions.

```# Fetch the parameters of the best solution.
best_solution_weights = torchga.model_weights_as_dict(model=model,
weights_vector=solution)
model.load_state_dict(best_solution_weights)
predictions = model(data_inputs)
print("Predictions : \n", predictions.detach().numpy())```

The next code calculates the loss after the model is trained.

```abs_error = loss_function(predictions, data_outputs)
print("Absolute Error : ", abs_error.detach().numpy())
```

After covering all the steps to build and train PyTorch models using PyGAD, next we’ll check out 2 examples with complete code.

## Examples

### Regression

For a regression problem that uses the mean absolute error as a loss function, here is the complete code.

```import torch
import torchga
import pygad

def fitness_func(solution, sol_idx):
global data_inputs, data_outputs, torch_ga, model, loss_function

model_weights_dict = torchga.model_weights_as_dict(model=model,
weights_vector=solution)

# Use the current solution as the model parameters.
model.load_state_dict(model_weights_dict)

predictions = model(data_inputs)
abs_error = loss_function(predictions, data_outputs).detach().numpy() + 0.00000001

solution_fitness = 1.0 / abs_error

return solution_fitness

def callback_generation(ga_instance):
print("Generation = {generation}".format(generation=ga_instance.generations_completed))
print("Fitness    = {fitness}".format(fitness=ga_instance.best_solution()))

# Create the PyTorch model.
input_layer = torch.nn.Linear(3, 2)
relu_layer = torch.nn.ReLU()
output_layer = torch.nn.Linear(2, 1)

model = torch.nn.Sequential(input_layer,
relu_layer,
output_layer)
# print(model)

# Create an instance of the pygad.torchga.TorchGA class to build the initial population.
torch_ga = torchga.TorchGA(model=model,
num_solutions=10)

loss_function = torch.nn.L1Loss()

# Data inputs
data_inputs = torch.tensor([[0.02, 0.1, 0.15],
[0.7, 0.6, 0.8],
[1.5, 1.2, 1.7],
[3.2, 2.9, 3.1]])

# Data outputs
data_outputs = torch.tensor([[0.1],
[0.6],
[1.3],
[2.5]])

# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/README_pygad_ReadTheDocs.html#pygad-ga-class
num_generations = 250 # Number of generations.
num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool.
initial_population = torch_ga.population_weights # Initial population of network weights
parent_selection_type = "sss" # Type of parent selection.
crossover_type = "single_point" # Type of the crossover operator.
mutation_type = "random" # Type of the mutation operator.
mutation_percent_genes = 10 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists.
keep_parents = -1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing.

ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
initial_population=initial_population,
fitness_func=fitness_func,
parent_selection_type=parent_selection_type,
crossover_type=crossover_type,
mutation_type=mutation_type,
mutation_percent_genes=mutation_percent_genes,
keep_parents=keep_parents,
on_generation=callback_generation)

ga_instance.run()

# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations.
ga_instance.plot_result(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4)

# Returning the details of the best solution.
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
print("Index of the best solution : {solution_idx}".format(solution_idx=solution_idx))

# Fetch the parameters of the best solution.
best_solution_weights = torchga.model_weights_as_dict(model=model,
weights_vector=solution)
model.load_state_dict(best_solution_weights)
predictions = model(data_inputs)
print("Predictions : \n", predictions.detach().numpy())

abs_error = loss_function(predictions, data_outputs)
print("Absolute Error : ", abs_error.detach().numpy())
```

The next figure is the result of calling the plot_result() method. It shows fitness value change by generation.

Here are the outputs of the print statements in the code. The MAE is 0.0069.

```Fitness value of the best solution = 145.42425295191546
Index of the best solution : 0
Predictions :
Predictions :
[[0.08401088]
[0.60939324]
[1.3010881 ]
[2.5010352 ]]
Absolute Error :  0.006876422
```

### Classification using CNN

The next code builds a convolutional neural network (CNN) using PyTorch for classifying a dataset of 80 images, where the size of each image is 100x100x3. Cross-entropy loss is used in this example because there are more than 2 classes.

Training data can be downloaded from these links:

```import torch
import torchga
import pygad
import numpy

def fitness_func(solution, sol_idx):
global data_inputs, data_outputs, torch_ga, model, loss_function

model_weights_dict = torchga.model_weights_as_dict(model=model,
weights_vector=solution)

model.load_state_dict(model_weights_dict)

predictions = model(data_inputs)

solution_fitness = 1.0 / (loss_function(predictions, data_outputs).detach().numpy() + 0.00000001)

return solution_fitness

def callback_generation(ga_instance):
print("Generation = {generation}".format(generation=ga_instance.generations_completed))
print("Fitness    = {fitness}".format(fitness=ga_instance.best_solution()))

# Build the PyTorch model.
input_layer = torch.nn.Conv2d(in_channels=3, out_channels=5, kernel_size=7)
relu_layer1 = torch.nn.ReLU()
max_pool1 = torch.nn.MaxPool2d(kernel_size=5, stride=5)

conv_layer2 = torch.nn.Conv2d(in_channels=5, out_channels=3, kernel_size=3)
relu_layer2 = torch.nn.ReLU()

flatten_layer1 = torch.nn.Flatten()
# The value 768 is pre-computed by tracing the sizes of the layers' outputs.
dense_layer1 = torch.nn.Linear(in_features=768, out_features=15)
relu_layer3 = torch.nn.ReLU()

dense_layer2 = torch.nn.Linear(in_features=15, out_features=4)
output_layer = torch.nn.Softmax(1)

model = torch.nn.Sequential(input_layer,
relu_layer1,
max_pool1,
conv_layer2,
relu_layer2,
flatten_layer1,
dense_layer1,
relu_layer3,
dense_layer2,
output_layer)

# Create an instance of the pygad.torchga.TorchGA class to build the initial population.
torch_ga = torchga.TorchGA(model=model,
num_solutions=10)

loss_function = torch.nn.CrossEntropyLoss()

# Data inputs
data_inputs = torch.from_numpy(numpy.load("dataset_inputs.npy")).float()
data_inputs = data_inputs.reshape((data_inputs.shape, data_inputs.shape, data_inputs.shape, data_inputs.shape))

# Data outputs
data_outputs = torch.from_numpy(numpy.load("dataset_outputs.npy")).long()

# Prepare the PyGAD parameters. Check the documentation for more information: https://pygad.readthedocs.io/en/latest/README_pygad_ReadTheDocs.html#pygad-ga-class
num_generations = 200 # Number of generations.
num_parents_mating = 5 # Number of solutions to be selected as parents in the mating pool.
initial_population = torch_ga.population_weights # Initial population of network weights.
parent_selection_type = "sss" # Type of parent selection.
crossover_type = "single_point" # Type of the crossover operator.
mutation_type = "random" # Type of the mutation operator.
mutation_percent_genes = 10 # Percentage of genes to mutate. This parameter has no action if the parameter mutation_num_genes exists.
keep_parents = -1 # Number of parents to keep in the next population. -1 means keep all parents and 0 means keep nothing.

# Create an instance of the pygad.GA class
ga_instance = pygad.GA(num_generations=num_generations,
num_parents_mating=num_parents_mating,
initial_population=initial_population,
fitness_func=fitness_func,
parent_selection_type=parent_selection_type,
crossover_type=crossover_type,
mutation_type=mutation_type,
mutation_percent_genes=mutation_percent_genes,
keep_parents=keep_parents,
on_generation=callback_generation)

# Start the genetic algorithm evolution.
ga_instance.run()

# After the generations complete, some plots are showed that summarize how the outputs/fitness values evolve over generations.
ga_instance.plot_result(title="PyGAD & PyTorch - Iteration vs. Fitness", linewidth=4)

# Returning the details of the best solution.
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
print("Index of the best solution : {solution_idx}".format(solution_idx=solution_idx))

# Fetch the parameters of the best solution.
best_solution_weights = torchga.model_weights_as_dict(model=model,
weights_vector=solution)
model.load_state_dict(best_solution_weights)
predictions = model(data_inputs)
# print("Predictions : \n", predictions)

# Calculate the crossentropy for the trained model.
print("Crossentropy : ", loss_function(predictions, data_outputs).detach().numpy())

# Calculate the classification accuracy for the trained model.
accuracy = torch.sum(torch.max(predictions, axis=1).indices == data_outputs) / len(data_outputs)
print("Accuracy : ", accuracy.detach().numpy())
```

The next figure is the result of calling the plot_result() method. It shows fitness value change by generation.

Here’s some information about the trained model.

```Fitness value of the best solution = 1.3009520689219258
Index of the best solution : 0
Crossentropy :  0.7686678
Accuracy :  0.975
```

## Conclusion

We explored how to train PyTorch models with the genetic algorithm using a Python 3 library called PyGAD

PyGAD has a module torchga, which helps to formulate the problem of training PyTorch models as an optimization problem for the genetic algorithm. The torchga module creates an initial population of PyTorch model’s parameters, where each solution holds a different set of parameters for the model. Using PyGAD, the solutions in the population are evolved.

It’s a great way to play around with genetic algorithms. Try it, experiment a bit, and see what comes up!

READ NEXT

## ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Jakub Czakon | Posted November 26, 2020

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

• use different models and model hyperparameters
• use different training or evaluation data,
• run different code (including this small change that you wanted to test quickly)
• run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics.

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.

This is where ML experiment tracking comes in. ### 8 Creators and Core Contributors Talk About Their Model Training Libraries From PyTorch Ecosystem

Read more ### How to Keep Track of PyTorch Lightning Experiments with Neptune

Read more ### Top 10 Best Machine Learning Tools for Model Training

Read more Read more