Neptune Blog

PyTorch Loss Functions: The Ultimate Guide

Alfrick Opidi , Abhishek Jha

7 min

6th May, 2025

ML Model Development

Loss functions quantify the performance of machine-learning (ML) models by comparing their predictions to ground truth values.

Different ML tasks require different loss functions. For example, regression problems are often evaluated using mean squared error (MSE) loss, while cross-entropy is a popular choice for classification problems.

PyTorch provides a wide array of loss functions under its nn (neural network) module. Users can also define their own loss functions.

Experiment trackers like neptune.ai help data scientists monitor how the loss changes over the course of a training run and help them compare the performance of different model configurations.

Your neural networks can do a lot of different tasks. Whether it’s classifying data, like grouping pictures of animals into cats and dogs, regression tasks, like predicting monthly revenues, or anything else. Every task has a different output and needs a different type of loss function.

The way you configure your loss functions can make or break the performance of your algorithm. By correctly configuring the loss function, you can make sure your model will work how you want it to.

Luckily for us, there are loss functions we can use to make the most of machine learning tasks.

In this article, we’ll talk about popular loss functions in PyTorch, and about building custom loss functions. Once you’re done reading, you should know which one to choose for your project.

What are the loss functions?

Before we jump into PyTorch specifics, let’s refresh our memory of what loss functions are.

Loss functions are used to gauge the error between the prediction output and the provided target value. A loss function tells us how far the algorithm model is from realizing the expected outcome. The word ‘loss’ means the penalty that the model gets for failing to yield the desired results.

For example, a loss function (let’s call it J) can take the following two parameters:

Predicted output (y_pred)
Target value (y)

This function will determine your model’s performance by comparing its predicted output with the expected output. If the deviation between y_pred and y is very large, the loss value will be very high.

If the deviation is small or the values are nearly identical, it’ll output a very low loss value. Therefore, you need to use a loss function that can penalize a model properly when it is training on the provided dataset.

Loss functions change based on the problem statement that your algorithm is trying to solve.

How to add PyTorch loss functions?

PyTorch’s torch.nn module has multiple standard loss functions that you can use in your project.

To add them, you need to first import the libraries:

import torch
import torch.nn as nn

Next, define the type of loss you want to use. Here’s how to define the mean absolute error loss function:

loss = nn.L1Loss()

After adding a function, you can use it to accomplish your specific task.

Which loss functions are available in PyTorch?

Broadly speaking, loss functions in PyTorch are divided into two main categories: regression losses and classification losses.

Regression loss functions are used when the model is predicting a continuous value, like the age of a person.

Classification loss functions are used when the model is predicting a discrete value, such as whether an email is spam or not.

Ranking loss functions are used when the model is predicting the relative distances between inputs, such as ranking products according to their relevance on an e-commerce search page.

Now we’ll explore the different types of loss functions in PyTorch, and how to use them:

Mean Absolute Error Loss
Mean Squared Error Loss
Negative Log-Likelihood Loss
Cross-Entropy Loss
Hinge Embedding Loss
Margin Ranking Loss
Triplet Margin Loss
Kullback-Leibler divergence

1. PyTorch Mean Absolute Error (L1 Loss Function)

torch.nn.L1Loss

The Mean Absolute Error (MAE), also called L1 Loss, computes the average of the sum of absolute differences between actual values and predicted values.

It checks the size of errors in a set of predicted values, without caring about their positive or negative direction. If the absolute values of the errors are not used, then negative values could cancel out the positive values.

The Pytorch L1 Loss is expressed as:

x represents the actual value and y the predicted value.

When could it be used?

Regression problems, especially when the distribution of the target variable has outliers, such as small or big values that are a great distance from the mean value. It is considered to be more robust to outliers.

Example

import torch
import torch.nn as nn
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
mae_loss = nn.L1Loss()
output = mae_loss(input, target)
output.backward()
print("input: ", input)
print("target: ", target)
print("output: ", output)

input: tensor([[-0.3932, -0.3416, -0.8294,  0.4837,  0.8330],
               [-0.6068,  1.0592,  1.1254, -0.8141,  0.8664],
               [-0.0442, -1.1115, -0.1092, -0.9268, -1.3132]],
              requires_grad=True)
target: tensor([[-0.4229,  0.8182, -1.4838,  0.3547,  1.2247],
                [-0.3112,  0.3926, -0.6457, -0.2093, -1.3689],
                [-0.0660, -0.7041, -1.5288, -0.1186,  0.7289]])
output: tensor(0.8425, grad_fn=<MeanBackward0>)

2. PyTorch Mean Squared Error Loss Function

torch.nn.MSELoss

The Mean Squared Error (MSE), also called L2 Loss, computes the average of the squared differences between actual values and predicted values.

Pytorch MSE Loss always outputs a positive result, regardless of the sign of actual and predicted values. To enhance the accuracy of the model, you should try to reduce the L2 Loss—a perfect value is 0.0.

The squaring implies that larger mistakes produce even larger errors than smaller ones. If the classifier is off by 100, the error is 10,000. If it’s off by 0.1, the error is 0.01. This punishes the model for making big mistakes and encourages small mistakes.

The Pytorch L2 Loss is expressed as:

x represents the actual value and y the predicted value.

When could it be used?

MSE is the default loss function for most Pytorch regression problems.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
mse_loss = nn.MSELoss()
output = mse_loss(input, target)
output.backward()

print("input: ", input)
print("target: ", target)
print("output: ", output

input:  tensor([[ 0.4772,  1.4035, -1.8414, -0.3521,  0.4028],
       [-0.1333, -1.1465, -0.0071,  0.7151, -0.0134],
       [-0.5469, -1.6369,  1.1091,  0.7617, -0.0533]], requires_grad=True)
target:  tensor([[-0.5415,  0.8121, -0.0089,  0.4808,  1.7725],
       [ 0.7138, -0.4299, -0.2525, -0.9443,  0.1625],
       [-0.2716,  1.1664, -0.0361,  0.7791,  0.7789]])
output:  tensor(1.4220, grad_fn=<MseLossBackward0>)

3. PyTorch Negative Log-Likelihood Loss Function

torch.nn.NLLLoss

The Negative Log-Likelihood Loss function (NLL) is applied only on models with the softmax function as an output activation layer. Softmax refers to an activation function that calculates the normalized exponential function of every unit in the layer.

The Softmax function is expressed as:

The function takes an input vector of size N, and then modifies the values such that every one of them falls between 0 and 1. Furthermore, it normalizes the output such that the sum of the N values of the vector equals to 1.

NLL uses a negative connotation since the probabilities (or likelihoods) vary between zero and one, and the logarithms of values in this range are negative. In the end, the loss value becomes positive.

In NLL, minimizing the loss function assists us get a better output. The negative log likelihood is retrieved from approximating the maximum likelihood estimation (MLE). This means that we try to maximize the model’s log likelihood, and as a result, minimize the NLL.

In NLL, the model is punished for making the correct prediction with smaller probabilities and encouraged for making the prediction with higher probabilities. The logarithm does the punishment.

NLL does not only care about the prediction being correct but also about the model being certain about the prediction with a high score.

The Pytorch NLL Loss is expressed as:

where x is the input, y is the target, w is the weight, and N is the batch size.

When could it be used?

Multi-class classification problems

Example

import torch
import torch.nn as nn

# size of input (N x C) is = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# every element in target should have 0 <= value < C
target = torch.tensor([1, 0, 4])

m = nn.LogSoftmax(dim=1)
nll_loss = nn.NLLLoss()
output = nll_loss(m(input), target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

input:  tensor([[ 0.0252,  0.5661, -0.3065, -0.1319,  1.1892],
       [-1.9397,  0.1411,  0.0492, -0.3845,  0.8581],
       [-0.9376,  0.3773,  0.5343, -0.3764, -0.8048]], requires_grad=True)
target:  tensor([1, 0, 4])
output:  tensor(2.4822, grad_fn=<NllLossBackward0>)

4. PyTorch Cross-Entropy Loss Function

torch.nn.CrossEntropyLoss

This loss function computes the difference between two probability distributions for a provided set of occurrences or random variables.

It is used to work out a score that summarizes the average difference between the predicted values and the actual values. To enhance the accuracy of the model, you should try to minimize the score—the cross-entropy score is between 0 and 1, and a perfect value is 0.

Other loss functions, like the squared loss, punish incorrect predictions. Cross-Entropy penalizes greatly for being very confident and wrong.

Unlike the Negative Log-Likelihood Loss, which doesn’t punish based on prediction confidence, Cross-Entropy punishes incorrect but confident predictions, as well as correct but less confident predictions.

The Cross-Entropy function has a wide range of variants, of which the most common type is the Binary Cross-Entropy (BCE). The BCE Loss is mainly used for binary classification models; that is, models having only 2 classes.

The Pytorch Cross-Entropy Loss is expressed as:

Where x is the input, y is the target, w is the weight, C is the number of classes, and N spans the mini-batch dimension.

When could it be used?

Binary classification tasks, for which it’s the default loss function in Pytorch.
Creating confident models—the prediction will be accurate and with a higher probability.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)

cross_entropy_loss = nn.CrossEntropyLoss()
output = cross_entropy_loss(input, target)
output.backward()

print("input: ", input)
print("target: ", target)
print("output: ", output)

input:  tensor([[ 0.2099, -0.2404, -0.3025,  0.0381,  2.2762],
       [ 1.0351,  0.0693,  0.2797, -0.2118,  0.1180],
       [ 1.0161,  0.9912,  1.1134, -0.2505,  1.6015]], requires_grad=True)
target:  tensor([0, 0, 2])
output:  tensor(1.6232, grad_fn=<NllLossBackward0>)

5. PyTorch Hinge Embedding Loss Function

torch.nn.HingeEmbeddingLoss

The Hinge Embedding Loss is used for computing the loss when there is an input tensor, x, and a labels tensor, y. Target values are between {1, -1}, which makes it good for binary classification tasks.

With the Hinge Loss function, you can give more error whenever a difference exists in the sign between the actual class values and the predicted class values. This motivates examples to have the right sign.

The Hinge Embedding Loss is expressed as:

When could it be used?

Classification problems, especially when determining if two inputs are dissimilar or similar.
Learning nonlinear embeddings or semi-supervised learning tasks.

Example

import torch
import torch.nn as nn

input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)

hinge_loss = nn.HingeEmbeddingLoss()
output = hinge_loss(input, target)
output.backward()

print("input: ", input)
print("target: ", target)
print("output: ", output)

input:  tensor([[ 0.3218,  0.4857,  0.6402, -0.8121, -0.8217],
       [ 0.4293,  0.2042, -0.2131, -0.8387, -0.5315],
       [-1.1620, -2.1956, -0.2507, -1.9591, -2.0346]], requires_grad=True)
target:  tensor([[ 1.4445e+00, -9.7065e-02,  1.1080e-02,  5.7790e-01,  1.8098e+00],
       [ 3.5577e-01, -3.3131e-01,  2.1718e-01, -2.0629e+00, -7.7428e-05],
       [ 7.5021e-01, -2.3516e+00,  1.6007e+00, -4.6854e-02, -7.2059e-01]])
output:  tensor(1., grad_fn=<MeanBackward0>)

6. PyTorch Margin Ranking Loss Function

torch.nn.MarginRankingLoss

The Margin Ranking Loss computes a criterion to predict the relative distances between inputs. This is different from other loss functions, like MSE or Cross-Entropy, which learn to predict directly from a given set of inputs.

With the Margin Ranking Loss, you can calculate the loss provided there are inputs x1, x2, as well as a label tensor, y (containing 1 or -1).

When y == 1, the first input will be assumed as a larger value. It’ll be ranked higher than the second input. If y == -1, the second input will be ranked higher.

The Pytorch Margin Ranking Loss is expressed as:

When could it be used?

Ranking problems

Example

import torch
import torch.nn as nn

input_one = torch.randn(3, requires_grad=True)
input_two = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()

ranking_loss = nn.MarginRankingLoss()
output = ranking_loss(input_one, input_two, target)
output.backward()

print('input one: ', input_one)
print('input two: ', input_two)
print('target: ', target)
print('output: ', output)

input one:  tensor([ 0.0011,  0.4707, -0.9995], requires_grad=True)
input two:  tensor([-0.1145,  0.2555, -0.1443], requires_grad=True)
target:  tensor([-1.,  1., -1.])
output:  tensor(0.0385, grad_fn=<MeanBackward0>)

7. PyTorch Triplet Margin Loss Function

torch.nn.TripletMarginLoss

The Triplet Margin Loss computes a criterion for measuring the triplet loss in models. With this loss function, you can calculate the loss provided there are input tensors, x1, x2, x3, as well as margin with a value greater than zero.

A triplet consists of a (anchor), p (positive examples), and n (negative examples).

The Pytorch Triplet Margin Loss is expressed as:

When could it be used?

Determining the relative similarity existing between samples.
It is used in content-based retrieval problems

Example

import torch
import torch.nn as nn

anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)

triplet_margin_loss = nn.TripletMarginLoss(margin=1.0, p=2)
output = triplet_margin_loss(anchor, positive, negative)
output.backward()

print("anchor: ", anchor)
print("positive: ", positive)
print("negative: ", negative)
print("output: ", output)

anchor:  tensor([[ 0.0420,  0.4353, -0.5644,  ..., -1.0099,  0.2384,  0.8086],
       [ 0.6183,  0.6203,  1.4393,  ...,  0.2478, -0.9575, -0.5772],
       [ 1.5213,  1.9588, -0.2662,  ..., -0.1097,  1.1710, -0.0689],
       ...,
       [ 1.8697, -2.2339, -0.5477,  ...,  0.4452, -1.0655,  1.1905],
       [-2.3414,  0.2552,  0.1521,  ..., -1.4103, -0.5110, -0.2261],
       [ 0.5127, -0.3725,  1.1312,  ...,  0.4546,  0.1530, -0.8535]],
      requires_grad=True)
positive:  tensor([[ 1.9741,  0.5304, -0.9011,  ..., -1.6875,  0.7996, -2.3709],
       [-0.6146, -0.5467,  1.3600,  ..., -1.2451,  0.6341,  1.5004],
       [ 0.8361,  0.5437,  1.4842,  ...,  1.5463,  0.8758, -1.5748],
       ...,
       [-0.6357, -0.1118, -0.0601,  ..., -1.2194, -2.0288,  0.9916],
       [-0.2456, -0.6198, -1.4590,  ..., -0.9597, -0.6513, -0.3821],
       [-0.5308,  0.0270, -0.2375,  ..., -0.5044,  0.5663,  0.5244]],
      requires_grad=True)
negative:  tensor([[-0.6022,  0.2600, -1.8593,  ..., -1.1654,  2.5118, -0.1412],
       [ 0.3645, -1.7209,  1.1791,  ..., -0.4102, -1.8362,  1.1563],
       [ 1.0960, -0.4116,  0.3680,  ...,  0.3262, -0.7347, -1.1815],
       ...,
       [ 0.7241, -0.6623, -0.4320,  ...,  0.2540,  0.7246, -1.7698],
       [-1.2495,  1.0588, -1.4291,  ..., -1.2511,  0.7142, -1.0096],
       [ 1.3198,  0.2441, -0.0173,  ..., -1.4373, -0.1539,  1.0322]],
      requires_grad=True)
output:  tensor(0.9538, grad_fn=<MeanBackward0>)

8. PyTorch Kullback-Leibler Divergence Loss Function

torch.nn.KLDivLoss

The Kullback-Leibler Divergence, shortened to KL Divergence, computes the difference between two probability distributions.

With this loss function, you can compute the amount of lost information (expressed in bits) in case the predicted probability distribution is utilized to estimate the expected target probability distribution.

Its output tells you the proximity of two probability distributions. If the predicted probability distribution is very far from the true probability distribution, it’ll lead to a big loss. If the value of KL Divergence is zero, it implies that the probability distributions are the same.

KL Divergence behaves just like Cross-Entropy Loss, with a key difference in how they handle predicted and actual probability. Cross-Entropy punishes the model according to the confidence of predictions, and KL Divergence doesn’t. KL Divergence only assesses how the probability distribution prediction is different from the distribution of ground truth.

The KL Divergence Loss is expressed as:

x represents the true label’s probability and y represents the predicted label’s probability.

When could it be used?

Approximating complex functions
Multi-class classification tasks
If you want to make sure that the distribution of predictions is similar to that of training data

Example

import torch
import torch.nn as nn

input = torch.randn(2, 3, requires_grad=True)
target = torch.randn(2, 3)

kl_loss = nn.KLDivLoss(reduction = 'batchmean')
output = kl_loss(input, target)
output.backward()

print('input: ', input)
print('target: ', target)
print('output: ', output)

input:  tensor([[0.6695, 0.1312, 0.6486],
       [0.3960, 0.8095, 0.3257]], requires_grad=True)
target:  tensor([[ 1.0613,  0.8108,  0.4006],
       [-1.9618,  0.4705,  2.0413]])
output:  tensor(nan, grad_fn=<DivBackward0>)

How to create a custom loss function in PyTorch?

PyTorch lets you create your own custom loss functions to implement in your projects.

Here’s how you can create your own simple Cross-Entropy Loss function.

Creating custom loss function as a python function

def my_custom_loss(my_outputs, my_labels):
   #specifying the batch size
   my_batch_size = my_outputs.size()[0]
  
   #calculating the log of softmax values          
   my_outputs = F.log_softmax(my_outputs, dim=1)
  
   #selecting the values that correspond to labels
   my_outputs = my_outputs[range(my_batch_size), my_labels]

   return -torch.sum(my_outputs)/number_examples

You can also create other advanced PyTorch custom loss functions.

Creating custom loss function with a class definition

Let’s modify the Dice coefficient, which computes the similarity between two samples, to act as a loss function for binary classification problems:

class DiceLoss(nn.Module):
   def __init__(self, weight=None, size_average=True):
       super(DiceLoss, self).__init__()

   def forward(self, inputs, targets, smooth=1):

       inputs = F.sigmoid(inputs)

       inputs = inputs.view(-1)
       targets = targets.view(-1)

       intersection = (inputs * targets).sum()
       dice = (2.0 * intersection + smooth) / (inputs.sum() + targets.sum() + smooth)

       return 1 - dice

How to monitor PyTorch loss functions?

It is quite obvious that while training a model, one needs to keep an eye on the loss function values to track the model’s performance. As the loss value keeps decreasing, the model keeps getting better. There are a number of ways that we can do this. Let’s take a look at them.

For this, we will be training a simple Neural Network created in PyTorch which will perform classification on the famous Iris dataset.

Making the required imports for getting the dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Loading the dataset.

iris = load_iris()
X = iris['data']
y = iris['target']
names = iris['target_names']
feature_names = iris['feature_names']

Scaling the dataset to have mean=0 and variance=1, gives quick model convergence.

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Splitting the dataset into train and test in an 80-20 ratio.

X_train, X_test, y_train, y_test = train_test_split(
   X_scaled, y, test_size=0.2, random_state=2
)

Making the necessary imports for our Neural Network and its training.

import torch
import torch.nn.functional as F
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('ggplot')

Defining our network.

class PyTorch_NN(nn.Module):
   def __init__(self, input_dim, output_dim):
       super(PyTorch_NN, self).__init__()
       self.input_layer = nn.Linear(input_dim, 128)
       self.hidden_layer = nn.Linear(128, 64)
       self.output_layer = nn.Linear(64, output_dim)

   def forward(self, x):
       x = F.relu(self.input_layer(x))
       x = F.relu(self.hidden_layer(x))
       x = F.softmax(self.output_layer(x), dim=1)
       return x

Defining functions for getting accuracy and training the network.

def get_accuracy(pred_arr,original_arr):
   pred_arr = pred_arr.detach().numpy()
   original_arr = original_arr.numpy()
   final_pred= []

   for i in range(len(pred_arr)):
       final_pred.append(np.argmax(pred_arr[i]))
   final_pred = np.array(final_pred)
   count = 0

   for i in range(len(original_arr)):
       if final_pred[i] == original_arr[i]:
           count+=1
   return count/len(final_pred)*100

def train_network(model, optimizer, criterion, X_train, y_train, X_test, y_test, num_epochs):
   train_loss=[]
   train_accuracy=[]
   test_accuracy=[]

   for epoch in range(num_epochs):

       #forward feed
       output_train = model(X_train)

       train_accuracy.append(get_accuracy(output_train, y_train))

       #calculate the loss
       loss = criterion(output_train, y_train)
       train_loss.append(loss.item())

       #clear out the gradients from the last step loss.backward()
       optimizer.zero_grad()

       #backward propagation: calculate gradients
       loss.backward()

       #update the weights
       optimizer.step()

       with torch.no_grad():
           output_test = model(X_test)
           test_accuracy.append(get_accuracy(output_test, y_test))

       if (epoch + 1) % 5 == 0:
           print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {loss.item():.4f}, Train Accuracy: {sum(train_accuracy)/len(train_accuracy):.2f}, Test Accuracy: {sum(test_accuracy)/len(test_accuracy):.2f}")

   return train_loss, train_accuracy, test_accuracy

Creating model, optimizer, and loss function object.

input_dim = 4
output_dim = 3
learning_rate = 0.01

model = PyTorch_NN(input_dim, output_dim)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Monitoring PyTorch loss in the notebook

Above, we have used print statements in the train_network function to monitor the loss as well as accuracy. Let’s see this in action:

X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

train_loss, train_accuracy, test_accuracy = train_network(
   model=model,
   optimizer=optimizer,
   criterion=criterion,
   X_train=X_train,
   y_train=y_train,
   X_test=X_test,
   y_test=y_test,
   num_epochs=100,
)

We get an output like this:

Since we’ve stored the intermediate values in lists, we can also plot the metrics using Matplotlib:

fig, (ax1, ax2, ax3) = plt.subplots(3, figsize=(12, 6), sharex=True)

ax1.plot(train_accuracy)
ax1.set_ylabel("training accuracy")

ax2.plot(train_loss)
ax2.set_ylabel("training loss")

ax3.plot(test_accuracy)
ax3.set_ylabel("test accuracy")

ax3.set_xlabel("epochs")

This will give us a graph that helps us analyze the correlation between loss and accuracy:

This method gets the job done. But if we train several model versions with different parameters, or have to analyze the model’s performance over time, we need a more capable experiment tracking solution.

Monitoring PyTorch loss with neptune.ai

A simpler way to monitor your metrics would be to log them in a service like neptune.ai and focus on more important tasks, such as building and training the model.

Getting started: Logging model parameters to neptune.ai

Disclaimer

Please note that this article references a deprecated version of Neptune.

For information on the latest version with improved features and functionality, please visit our website.

To get started with Neptune, we need to install a couple of libraries:

pip install neptune python-dotenv

neptune is a Python SDK to authorize communication between your scripts and Neptune. We will need python-dotenv for managing environment variables, such as Neptune project name and your API token.

First, you need to retrieve those variables from your Neptune account:

Create a new project.
Log in to Neptune.

Here’s how it looks in the UI.

3. Copy your API key and the project name which is in the format of workspace-name/project-name.

4. Return to your code editor and create a file named .env in your working directory:

$ touch .env

5. Paste the following contents into the file:

NEPTUNE_PROJECT_NAME="your-workspace-name/project-name"
NEPTUNE_API_TOKEN="your-api-token"

Next, we need to initialize a new Neptune run using the init_run() method. The method requires your API token and the project name, which we retrieve using the os and python-dotenv libraries:

import os
import neptune
from dotenv import load_dotenv

# Load the environment variables
load_dotenv()

# Retrieve the variables
project_name = os.getenv('NEPTUNE_PROJECT_NAME')
api_token = os.getenv('NEPTUNE_API_TOKEN')

# Initialize the run
run = neptune.init_run(project=project_name, api_token=api_token)

run is an instance of a Run object which we can use to log any metadata related to our ML experiments including:

Metrics
Losses
Scores
Model versions
Images
Model weights
Parameters

The syntax to log metadata is very intuitive as the Run object can be thought of as a special dictionary (don’t run the below snippet just yet):

# Logging model, optimizer and loss function name
run["config/model"] = type(model).__name__
run["config/criterion"] = type(criterion).__name__
run["config/optimizer"] = type(optimizer).__name__

If you need them later on, you can retrieve the logged details using a similar syntax:

run['config/model'].fetch()

'PyTorch_NN'

In the next sections, we will use this pattern of code to capture our model’s training process to Neptune.

Logging PyTorch training metrics to neptune.ai

To log the loss of our model, we need to add a couple of lines to the train_network function (notice the three lines where we use the run object):

def train_network(
   model, optimizer, criterion, X_train, y_train, X_test, y_test, num_epochs
):
   train_loss = []
   train_accuracy = []
   test_accuracy = []

   for epoch in range(num_epochs):

       # forward feed
       output_train = model(X_train)

       acc = get_accuracy(output_train, y_train)
       train_accuracy.append(acc)
      
       run["training/epoch/accuracy"].append(acc)

       # calculate the loss
       loss = criterion(output_train, y_train)
      
       run["training/epoch/loss"].append(loss)

       train_loss.append(loss.item())

       # clear out the gradients from the last step loss.backward()
       optimizer.zero_grad()

       # backward propagation: calculate gradients
       loss.backward()

       # update the weights
       optimizer.step()

       with torch.no_grad():
           output_test = model(X_test)
           test_acc = get_accuracy(output_test, y_test)
           test_accuracy.append(test_acc)

           run["test/epoch/accuracy"].append(test_acc)

       if (epoch + 1) % 5 == 0:
           print(
               f"Epoch {epoch+1}/{num_epochs}, Train Loss: {loss.item():.4f}, Train Accuracy: {sum(train_accuracy)/len(train_accuracy):.2f}, Test Accuracy: {sum(test_accuracy)/len(test_accuracy):.2f}"
           )

   return train_loss, train_accuracy, test_accuracy

Let’s rerun our model training and inspect the data that ends up in our Neptune project:

train_loss, train_accuracy, test_accuracy = train_network(
   model=model,
   optimizer=optimizer,
   criterion=criterion,
   X_train=X_train,
   y_train=y_train,
   X_test=X_test,
   y_test=y_test,
   num_epochs=100,
)

Using the PyTorch integration for advanced logging

For more sophisticated logging features such as automated capture of model parameters, logging frequency configuration, and model checkpointing, you can use neptune_pytorch library:

!pip install neptune_pytorch

from neptune_pytorch import NeptuneLogger

neptune_callback = NeptuneLogger(
   run=run,
   model=model,
   log_parameters=True,
   log_freq=30
)

The NeptuneLogger class requires both the run and model objects to enable logging. Then, it can automatically capture model parameters and gradients at a frequency specified by log_freq.

Using the neptune_callback object requires us to change the lines of code where the run object is used. The PyTorch integration collects all the data under a specific key of the run object defined by neptune_callback.base_namespace, so we replace run[‘key’] by run[neptune_callback.base_namespace][‘key’]. For example:

run[neptune_callback.base_namespace]['train/epoch/accuracy'].append(acc)

With those lines changed, we can retrain the model:

train_loss, train_accuracy, test_accuracy = train_network(
   model=model,
   optimizer=optimizer,
   criterion=criterion,
   X_train=X_train,
   y_train=y_train,
   X_test=X_test,
   y_test=y_test,
   num_epochs=100,
)

You can view the result in the Neptune app.

To stop the connection to Neptune and sync all data, call the stop() method:

run.stop()

Using the neptune_pytorch integration is the recommended method for logging PyTorch models. It gives you finer control over metadata generated during training and allows you to log more challenging artifacts such as model checkpoints and predictions in a formatted syntax.

Final thoughts

This article covered the most common loss functions in machine learning and how to use them in PyTorch. Choosing a loss function depends on the problem type like regression, classification or ranking. If none of the functions in today’s list don’t meet your requirements, PyTorch allows creating custom loss functions as well.

Loss functions are critical in informing you about the performance of your model. Therefore, you will spend a lot of time monitoring the loss and changing your model training strategy accordingly. And in our view, the best way to monitor loss is by using an experiment tracking tool such as Neptune.

Was the article useful?

More about PyTorch Loss Functions: The Ultimate Guide

Check out our product resources and related articles below:

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Product resource

How Veo Eliminated Work Loss With Neptune

How to Optimize GPU Usage During Model Training With neptune.ai

LLM Fine-Tuning and Model Selection Using Neptune and Transformers

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs

Transition Hub

Train FM

State of Foundation Model Training Report 2025

Transition Hub

Train FM

State of Foundation Model Training Report 2025

PyTorch Loss Functions: The Ultimate Guide

TL;DR

What are the loss functions?

How to add PyTorch loss functions?

Which loss functions are available in PyTorch?

1. PyTorch Mean Absolute Error (L1 Loss Function)

2. PyTorch Mean Squared Error Loss Function

3. PyTorch Negative Log-Likelihood Loss Function

4. PyTorch Cross-Entropy Loss Function

5. PyTorch Hinge Embedding Loss Function

6. PyTorch Margin Ranking Loss Function

7. PyTorch Triplet Margin Loss Function

8. PyTorch Kullback-Leibler Divergence Loss Function

How to create a custom loss function in PyTorch?

Creating custom loss function as a python function

Creating custom loss function with a class definition

How to monitor PyTorch loss functions?

Monitoring PyTorch loss in the notebook

Monitoring PyTorch loss with neptune.ai

Getting started: Logging model parameters to neptune.ai

Disclaimer

Logging PyTorch training metrics to neptune.ai

Using the PyTorch integration for advanced logging

Final thoughts

Was the article useful?

More about PyTorch Loss Functions: The Ultimate Guide

Check out our product resources and related articles below:

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

How Veo Eliminated Work Loss With Neptune

How to Optimize GPU Usage During Model Training With neptune.ai

LLM Fine-Tuning and Model Selection Using Neptune and Transformers

Explore more content topics:

TL;DR

What are the loss functions?

How to add PyTorch loss functions?

Which loss functions are available in PyTorch?

1. PyTorch Mean Absolute Error (L1 Loss Function)

2. PyTorch Mean Squared Error Loss Function

3. PyTorch Negative Log-Likelihood Loss Function

4. PyTorch Cross-Entropy Loss Function

5. PyTorch Hinge Embedding Loss Function

6. PyTorch Margin Ranking Loss Function

7. PyTorch Triplet Margin Loss Function

8. PyTorch Kullback-Leibler Divergence Loss Function

How to create a custom loss function in PyTorch?

Creating custom loss function as a python function

Creating custom loss function with a class definition

How to monitor PyTorch loss functions?

Monitoring PyTorch loss in the notebook

ML Experiment Tracking: Why It Matters, How to Implement it

Monitoring PyTorch loss with neptune.ai

Getting started: Logging model parameters to neptune.ai

Logging PyTorch training metrics to neptune.ai

Using the PyTorch integration for advanced logging

Final thoughts

Was the article useful?

Check out our product resources and related articles below:

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

How Veo Eliminated Work Loss With Neptune

How to Optimize GPU Usage During Model Training With neptune.ai

LLM Fine-Tuning and Model Selection Using Neptune and Transformers

Explore more content topics: