MLOps Blog

Moving From TensorFlow To PyTorch

11 min
8th August, 2023

The concept of Deep Learning frameworks, libraries, and numerous tools exist to reduce the large amounts of manual computations that must otherwise be calculated. TensorFlow and PyTorch are currently two of the most popular frameworks to construct neural network architectures. 

While TensorFlow was released a year before PyTorch, most developers are tending to shift towards PyTorch recently. In this article, we will explore most of the details on how to switch from TensorFlow to PyTorch. We will first understand the reasons for the usage of both these deep learning frameworks.

Then, we will dive deeper into the installation process of these libraries along with the hands-on approach to transition successfully. We will also look into the execution of an MNIST example with the PyTorch library and finally understand the pros and cons of PyTorch if it is the best in every scenario.

Introduction to both deep learning frameworks

In this section of the article, we will have a brief overview of both these deep learning frameworks, namely TensorFlow and PyTorch. We will also try to understand why people are transitioning from TensorFlow to PyTorch with some practical explanations before we proceed to approach the question with a more hands-on approach.


TensorFlow is one of the older interpretations of deep learning frameworks for developing neural networks introduced by Google in 2015. A product by the Google Brain team is always one of the most trustworthy methods for getting started with any type of complex task. 

TensorFlow is an open-source library with which you can develop and construct most of the machine learning and artificial intelligence models. The updated version of TensorFlow 2.0 also integrated with Keras is a great option for training, developing, manipulating, and running your machine learning models accordingly.

May be useful

You can keep track of your TensorFlow model-training metadata using Check the documentation to see how it works.


PyTorch was developed by Facebook’s AI Research (FAIR) team in September 2016. It has garnered a lot of attention, especially recently, with most of the data scientists and researchers trying to make a successful transition from TensorFlow to PyTorch. It is seamlessly integrated with the Python programming language that most developers find working with it very natural. 

PyTorch can be considered as a platform where you can work with tensors (similar to a library like NumPy, where we use arrays) to compute deep learning models with GPU acceleration. With the help of PyTorch, you are also able to obtain dynamic graphs with which you can analyze the working methodology of your models on the fly.

May be useful

You can keep track of your PyTorch model-training metadata using Check the documentation to see how it works.

Why are people moving from TensorFlow to PyTorch?

While TensorFlow seems like a great tool to have in your arsenal for most deep learning tasks, most people are now preferring to switch from TensorFlow to PyTorch. Let us discuss some of the primary reasons for this transition from one deep learning framework to the other.

  • Tensorflow creates static graphs as opposed to PyTorch, which creates dynamic graphs. In TensorFlow, most of the computational graphs of the machine learning models are supposed to be completely defined from scratch
  • In PyTorch, you can define, manipulate, and adapt to the particular graph of work, which is especially useful in a scenario like variable-length inputs in RNNs.  
  • The working of models with PyTorch is more intuitive as it was primarily developed with Python in mind, and it seamlessly fits with the natural working of the Python programming language. 
  • TensorFlow, on the other hand, has a steep learning curve, which most users will find difficult to grasp at first glance. Even when you learn TensorFlow, several concepts will need re-visiting. 
  • PyTorch is best for developing rapid prototypes and research projects which is why most people are choosing to transition from TensorFlow to PyTorch.

You may also like

How to Keep Track of Experiments in PyTorch Using Neptune

Is it easy to switch from TensorFlow to PyTorch?

In this section of the article, we will focus on if it is easy to switch from TensorFlow to PyTorch. The first step to switching from any deep learning framework is the installation process. The library must be easy to install so that the developers can start working with the construction of models without having to worry too much about the intricate details of the installation procedure. 

While the CPU version of installing both the TensorFlow and PyTorch libraries is quite simple, it is primitive to use the CPU for training complex models. Let us look at a quick overview and comparison between their GPU installation procedures.

Exploring the complex procedure for TensorFlow installation

If you are trying to install the GPU version of TensorFlow on your Windows operating system or any Linux system, the entire procedure is quite complex. Even if you are making use of the Anaconda software for developing your deep learning projects, the process for obtaining the latest version of TensorFlow is a bit more lengthy than you would expect. But if you are interested in utilizing any version of TensorFlow, the following command can be used for the installation of TensorFlow 2.0 in your virtual environment. 

conda install -c anaconda tensorflow-gpu

However, the version of TensorFlow keeps getting updated constantly, and the above installation does not always do so. Hence, a lot of issues could be caused while sharing codes or working on research projects. 

To install the latest version of TensorFlow on your system, you will need to ensure that you download the particular drivers for the installation process. You need to check the official TensorFlow documentation for the specific information on what CUDA files and CuDNN versions you will need to download and install on your system. 

The CUDA and CuDNN versions are constantly updated by Nvidia, similar to their drivers. Whenever there is a new release of the latest version of TensorFlow, they also usually update the CUDA and CuDNN requirements. 

Some of the functions are depreciated in the newer versions, and hence, it becomes essential to download the newer versions to stay updated. But each time we need to install the new TensorFlow version, both the Cuda and CuDNN updates must be downloaded and installed individually. This entire process is quite tedious and needs to be repeated with each update. 

Understanding the simpler PyTorch installation

PyTorch installation
PyTorch installation | Source

In comparison to the TensorFlow installation, the PyTorch installation is much simpler. For the successful installation of PyTorch, all you need to do is go to the official PyTorch website for the installation guide, which you can from this link, and select your type of software requirements, your package of development, and your computing platform. The website will automatically provide you with the accurate command that you can copy-paste for starting the installation in your default or virtual environment. An example installation statement is as follows:

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

How to make the switch from TensorFlow to PyTorch

For someone who is an expert in TensorFlow might wonder about the differences in switching from one library to another. The transition for switching from TensorFlow to PyTorch isn’t too complex because PyTorch offers a pythonic approach to solve most problems. In this section, we will cover some of the major basics of making a successful switch from TensorFlow to PyTorch. 

We will understand how to implement and work with tensors in both these deep learning frameworks. We will then understand their mechanism of working of their graphs, namely the static graph approach in TensorFlow and the dynamic graphs in PyTorch. Finally, in this section, we will also cover the comparison of both the training loops of TensorFlow and PyTorch.

Understanding how to implement tensors

Tensors are n-dimensional arrays through which you can develop and construct most of the machine learning projects. The core aspect of most deep learning frameworks is the concept of tensors, which is a generalization of vectors and matrices. Once you have installed PyTorch on your system, you can proceed to compare some of the basic differences between the coding procedure of TensorFlow and PyTorch. 

Let us compare and implement some of the basic implementations to get started with the transition from one library to the other. We will firstly look at how to import the two libraries, initialize tensors, and perform some basic operations with these tensors. The TensorFlow code for initializing the tensors is as follows:

import tensorflow as tf

rank_2_tensor = tf.constant([[1, 2],
                             [3, 4],
                             [5, 6]], dtype=tf.int32)

In PyTorch, the same implementation can be completed as follows:

import torch

rank_2_tensor = torch.tensor([[1, 2],
                             [3, 4],
                             [5, 6]], dtype=torch.int32)

The importing of libraries and defining of tensors is quite simple in both these frameworks. Let us analyze how we can perform some of the basic tensor computations in both these libraries. Firstly, let us look at the TensorFlow implementation example of the following (note that you can also directly define the values shown in the example as variables if required).

a = tf.constant([[1, 2],
                 [3, 4]])
b = tf.constant([[1, 1],
                 [1, 1]])

a = tf.Variable(a)
b = tf.Variable(b)

print(tf.add(a, b), "n")
print(tf.multiply(a, b), "n")
print(tf.matmul(a, b), "n")

In PyTorch, the following implementation can be interpreted as follows:

a = torch.tensor([1, 2, 3], dtype=torch.float)
b = torch.tensor([7, 8, 9], dtype=torch.float)

print(torch.add(a, b))
print(torch.subtract(b, a))

# Calculating the dot product

Now that we have a brief understanding of how to work with tensors in both these deep learning frameworks, let us understand the mechanism of their working with a hands-on approach.

Mechanism of their working (understand the respective graphs)

Most deep learning frameworks make use of computational graphs. These computational graphs define the order in which the computations must be performed so that we can acquire the best results accordingly. 

There are usually two interpreters that are utilized for the computation of deep learning problems, where each one of them serves a different purpose. One of the interpreters is used for the programming language (Python, in most cases), and the other interpreter manages the computational graphs as desired. 

Hence, most of the deep learning frameworks utilize a programming language like Python that sets up the computational graph, and an execution mechanism is also set up, which is quite different from the host language. 

This kind of strange setup is primarily motivated for efficiency and optimization reasons. A computational graph can be optimized and run in parallel in the target GPU. Hence, the entire computation process is sped up and more efficient due to parallelism and dependency driving scheduling.

In TensorFlow, we make use of static computational graphs. These work on the typical convention of “Define and Run.” In such builds, we create and connect all the variables at the beginning and initialize them into a static (unchanging) session. 

However, it is essential to define some of the variable parameters in a static graph, which can sometimes be considered an inconvenience, especially for tasks in which RNN type networks are used. I would recommend checking out the following website for more detailed information on how these static graphs work. Let us look at the implementation of a static TensorFlow graph.

import tensorflow as tf

a = tf.Variable(15)
b = tf.Variable(15)

prod = tf.multiply(a, b)
sum = tf.add(a, b)

result = prod/sum


 tf.Tensor(7.5, shape=(), dtype=float64)


The computational graph
The computational graph | Source

In PyTorch, we utilize dynamic graphs, where the computational graphs are built as we proceed to type our code. The computational graph is built up directly when the variables are initially declared by the user, and the computational graphs are rebuilt after each iteration of training. Both dynamic graphs and static graphs have their specific use cases where one can be better than the other in a specific scenario.

Dynamic graphs are usually preferable over static graphs as we can modify the elements of interest accordingly without stressing other factors allowing for higher flexibility during training and model building. One of the only drawbacks of making use of such a method is that it can sometimes take a longer time to rebuild the graph. The GIF representation shown below is one of the best examples of how a project in PyTorch works when you construct it.

The dynamic graph
The dynamic graph | Source

Comparison of training loops

In TensorFlow, the procedure of creating training loops is slightly complex and not very intuitive. We usually make use of a tf.function that acts as a decorator for compiling the model in the terms of a static graph. 

Normal execution in TensorFlow uses eager execution, which is good for debugging but bad for faster performance and implementation of models. Hence, we make use of the tf.function enabling the framework to apply global performance optimizations.

We then define the training step where we usually make use of the Gradient Tape function, which performs automatic differentiation. We can then define the model in which the training process is to take place and calculate the loss accordingly. 

The gradient values are applied, the backpropagation process is undergone, and finally, the training metrics are updated. We can return our desired metrics and values after the model has finished training.

def train_step(x, y):
    with tf.GradientTape() as tape:
        logits = model(x, training=True)
        loss_value = loss_fn(y, logits)
    grads = tape.gradient(loss_value, model.trainable_weights)
    optimizer.apply_gradients(zip(grads, model.trainable_weights))
    train_acc_metric.update_state(y, logits)
    return loss_value

The PyTorch implementation of such training loops is quite simple and intuitive. We can create our variables and define the dynamic graphs on the fly and proceed to train our models. We can assign our data and targets to the state of the device (CPU or GPU) and proceed to compute the forward propagation, which is the feed-forward computation of neural networks. 

Once the model completes the feed-forward training process, we can compute the backpropagation of the elements with the help of some of the pre-defined entities in PyTorch. We calculate the gradients, apply the backpropagation method, and perform a parameter update. The code for the following computation is as shown below.

for epoch in range(epochs):
    for batch, (data, target) in enumerate(train_loader):
        # Obtaining the cuda parameters
        data =
        target =
        # Forward propagation
        score = model(data)
        loss = criterion(score, target)
        # Backward propagation

Now that we have completed understanding some of the basic requirements for switching from TensorFlow to PyTorch, let us proceed to understand the code walkthrough for a deep learning project with the help of the PyTorch deep learning framework.

Code walkthrough of MNIST in PyTorch

In this section of the article, we will understand some of the major differences between TensorFlow and PyTorch. One of the best ways to gain further understanding of their working procedures is to do a hands-on implementation on a project. We will work on the classification of the numbers in the MNIST dataset for this code walkthrough comparison, where we will train a model with the help of PyTorch to classify the digits from 0-9. 

The first step is to import all the essential libraries required for computing the MNIST project. Since we are transitioning from TensorFlow to PyTorch, we will import all the required PyTorch libraries for this project. Using this deep learning framework, we can construct all the required layers in the neural network architecture that we will build. The necessary imports for PyTorch are described as follows in the below code block. 

# Importing all the essential libraries

import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
from import DataLoader
import numpy as np
import matplotlib.pyplot as plt

Our next step is to set the device parameters accordingly. We have the option to choose whether we want to set the default device for training in PyTorch as CPU or GPU. It is always preferable to use a GPU if you have one at your disposal. However, for this project, even a CPU device will serve the purpose and the training should not take too long. In TensorFlow, the default device is usually set as the GPU version depending on your installation. 

# Set the device

device = torch.device('cuda' if torch.cuda.is_available() else cpu)

Our next step is to define some of the hyperparameters that we will use for the construction and training of the model. The number of classes (0-9) added to a total of 10 classes is defined. We will set the default input size as 784 (28×28 is the image size of the MNIST data), the learning rate to 0.0001, the batch size to 64, and we will train the constructed model on a total of 3 epochs.

# Initializing the required hyperparameters

num_classes = 10
input_size = 784
batch_size = 64
lr = 0.0001
epochs = 3

In the next step, we will load our data. The PyTorch framework, similar to the TensorFlow library, has access to some default datasets, and MNIST is one of them. We will segregate the images (which are about 60000) in the form of training and testing datasets. The DataLoader function offers fantastic utility in loading our data. The code snippet below shows how you can load your data in PyTorch. 

# Collection of data

T = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

X_train = torchvision.datasets.MNIST(root='/datasets', train=True, download=True, transform=T)
train_loader = DataLoader(dataset=X_train, batch_size=batch_size, shuffle=True)

X_test = torchvision.datasets.MNIST(root='/datasets', train=False, download=True, transform=T)
test_loader = DataLoader(dataset=X_test, batch_size=batch_size, shuffle=True)

Now that we have collected the required data, we can finally proceed to construct the neural network architecture with the PyTorch deep learning framework. We will use a fully connected layer type build to approach our problem. 

The procedure to construct deep learning models with PyTorch is extremely simple and follows a Pythonic approach. We will define a class for neural networks and declare the fully connected layers for our model, which is defined by the “Linear” function. Note that in the case of TensorFlow, we would make use of the Dense function for fully connected layers.

# Constructing the model

class neural_network(nn.Module):
    def __init__(self, input_size, num_classes):
        super(neural_network, self).__init__()
        self.fc1 = nn.Linear(in_features=input_size, out_features=50)
        self.fc2 = nn.Linear(in_features=50, out_features=num_classes)

    def forward(self, x):
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

Now that we have completed both the collection of data and the construction of our deep learning model with PyTorch, we can proceed to define the type of loss that we will utilize and the type of optimizer that is best suited for the task. The cross-entropy loss is a fantastic option for multi-class classification problems like the MNIST project. 

Adam is one of the best default optimizers that can find utility in almost any scenario. We will train our model for the specified number of epochs. For the training process, we will use the feed-forward fully convolutional network and then apply backpropagation for learning the best weights accordingly.

# Loss and optimizer

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# Train the network

for epoch in range(epochs):
    for batch, (data, target) in enumerate(train_loader):
        # Obtaining the cuda parameters
        data =
        target =

        # Reshaping to suit our model
        data = data.reshape(data.shape[0], -1)

        # Forward propagation
        score = model(data)
        loss = criterion(score, target)

        # Backward propagation

Finally, now that our training method is complete, we can proceed to train and evaluate the constructed model and check the training and testing accuracy accordingly. The steps for completing the following are also quite straightforward as we can evaluate between the correctly classified images and the wrongly classified images and compute the accuracy accordingly. 

# Check the performance

def check_accuracy(loader, model):
    num_correct = 0
    num_samples = 0

    with torch.no_grad():
        for x, y in loader:
            x =
            y =
            x = x.reshape(x.shape[0], -1)

            scores = model(x)
            _, predictions = scores.max(1)
            num_correct += (predictions == y).sum()
            num_samples += predictions.size(0)

        if num_samples == 60000:
            print(f"Train accuracy = "
                  f"{float(num_correct) / float(num_samples) * 100:.2f}")
            print(f"Test accuracy = "
                  f"{float(num_correct) / float(num_samples) * 100:.2f}")

check_accuracy(train_loader, model)
check_accuracy(test_loader, model)

With this simple procedure of constructing a model to solve a deep learning task, you can achieve an accuracy of about 91% on both the testing and training data. Even though we utilized only a simple fully connected neural network architecture, we are able to obtain decent results. More significantly, we understood that it is quite simple to construct almost any type of research project with the help of PyTorch.

See also

The Best MLOps Tools and How to Evaluate Them

Should you switch from TensorFlow to PyTorch?

In this section, we will weigh in on the pros and cons of PyTorch to come to an ultimate conclusion if it is worth switching from TensorFlow to PyTorch. And is it worth all the time that researchers are spending in making the transition from TensorFlow to PyTorch for the research and development of deep learning neural networks? Let us start analyzing the pros and cons of PyTorch.

Pros of PyTorch

1. Pythonic in nature

PyTorch is constructed in a way that is intuitive to understand and easy to develop machine learning projects. Most of the code deployed in PyTorch is pythonic, which means the procedural coding is similar to most of the elements of Python. When working with TensorFlow, the code is a bit more low-level and difficult to understand, even if you have a decent understanding of the framework. 

Hence, there is an additional Keras high-level API that is now integrated into TensorFlow 2.0, where you can develop models more easily. PyTorch functionalities can easily be implemented with other amazing libraries such as Numpy, Scipy, and Cython. The ease of learning is also extensively increased because most of the syntax and applications of Pytorch are very similar to conventional Python programming.

2. Good documentation and community support

PyTorch has one of the best documentation that is available for grasping hold on a majority of the essential concepts. They have a detailed description where you can understand most of the core topics: torch.Tensor, Tensor Attributes, Tensor Views, torch.autograd, and so much more. You also have blogs and tutorial support for some deep learning projects. Apart from the default documentation, the entire community has high support for PyTorch and projects related to it.

3. Dynamic graphs

As discussed in detail in one of the previous sections of the article, PyTorch supports Dynamic Graphs as opposed to TensorFlow’s static graphs. This feature is especially useful for creating graphs on the fly. Dynamically created graphs are most useful when you cannot pre-determine the allocation of memory or other details for the particular computation. They offer higher flexibility for the users to develop their projects.

4. Lots of developers choosing PyTorch for projects

In recent times, developers and researchers are tending to shift more towards PyTorch for the construction of deep learning projects. Most of the researchers prefer sharing their codes on websites like GitHub with their PyTorch implementations of projects. 

The community is filled with wonderful resources and people who are willing to lend a helping hand when there is any confusion about a particular topic. It is easier to work, share, and develop PyTorch projects while working on a research project.

Cons of PyTorch

1. Lack of visualization techniques

TensorFlow has one of the best options for the visualization of the working of its developed models with the help of Tensorboard. Tensorboard is a fabulous data visualization toolkit through which you can monitor several features like the training and validation accuracy and loss, the model graph, viewing the constructed histograms, displaying images, and so much more. PyTorch does not have as great an option for visualization, and it is usually preferable to utilize Tensorboard with PyTorch as well. 

2. API server needed for production

Another advantage that TensorFlow enjoys over its PyTorch counterpart is that it has a lot of production tools that make it ready for the deployment of developed models. The scalability offered by TensorFlow is high because it was built to be production-ready. 

The TensorFlow serving offers a flexible, high-performance serving system for machine learning models designed for production environments. It deals with most of the inference aspects and managing the lifetimes of trained models. 

On the other hand, we have TorchServe, which is flexible and easy to use, but it does not have the same compactness as its TensorFlow counterpart and has a long way to go before it can compete with the superior deployment tool.


In this article, we have covered most of the essentials required for making a successful transition from the TensorFlow deep learning framework to PyTorch. PyTorch is fantastic for the development of rapid prototypes. Developers in modern times utilize PyTorch for most research projects to produce fast and effective results. 



Complex GPU installation

Simple GPU installation




Training loop

Uses Gradient Tape

Straightforward Pythonic approach


Tensorboard for high-quality visualization

Slightly lacks in this aspect

Ease of coding

Complex low-level TensorFlow code

Relatively much simpler

APIs for deployment

TensorFlow serving


Projects to utilize them

Large-scale deployment

Research-oriented and rapid prototype development

The versatility, Pythonic nature, great flexibility, and high speeds for the construction of deep neural network implementation make this framework one of the best options for research and development.

Related sources

Was the article useful?

Thank you for your feedback!