Neptune Blog

Image Processing in Python: Algorithms, Tools, and Methods You Should Know

Neetika Khandelwal

9 min

21st January, 2025

Computer Vision ML Tools

Images define the world, each image has its own story, it contains a lot of crucial information that can be useful in many ways. This information can be obtained with the help of the technique known as Image Processing.

It is the core part of computer vision which plays a crucial role in many real-world examples like robotics, self-driving cars, and object detection. Image processing allows us to transform and manipulate thousands of images at a time and extract useful insights from them. It has a wide range of applications in almost every field.

Python is one of the widely used programming languages for this purpose. Its amazing libraries and tools help in achieving the task of image processing very efficiently.

This article will teach you about classical algorithms, techniques, and tools to process the image and get the desired output.

Let’s get into it!

What is image processing?

As the name says, image processing means processing the image and this may include many different techniques until we reach our goal.

The final output can be either in the form of an image or a corresponding feature of that image. This can be used for further analysis and decision making.

But what is an image?

An image can be represented as a 2D function F(x,y) where x and y are spatial coordinates. The amplitude of F at a particular value of x,y is known as the intensity of an image at that point. If x,y, and the amplitude value is finite then we call it a digital image. It is an array of pixels arranged in columns and rows. Pixels are the elements of an image that contain information about intensity and color. An image can also be represented in 3D where x,y, and z become spatial coordinates. Pixels are arranged in the form of a matrix. This is known as an RGB image.

There are various types of images:

RGB image: It contains three layers of 2D image, these layers are Red, Green, and Blue channels.
Grayscale image: These images contain shades of black and white and contain only a single channel.

Classic image processing algorithms

1. Morphological Image Processing

Morphological image processing tries to remove the imperfections from the binary images because binary regions produced by simple thresholding can be distorted by noise. It also helps in smoothing the image using opening and closing operations.

Morphological operations can be extended to grayscale images. It consists of non-linear operations related to the structure of features of an image. It depends on the related ordering of pixels but on their numerical values. This technique analyzes an image using a small template known as structuring element which is placed on different possible locations in the image and is compared with the corresponding neighbourhood pixels. A structuring element is a small matrix with 0 and 1 values.

Let’s see the two fundamental operations of morphological image processing, Dilation and Erosion:

dilation operation adds pixels to the boundaries of the object in an image
erosion operation removes the pixels from the object boundaries.

The number of pixels removed or added to the original image depends on the size of the structuring element.

At this point you may be thinking “what is a structuring element?” Let me explain:

Structuring element is a matrix consisting of only 0’s and 1’s that can have any arbitrary shape and size. It is positioned at all possible locations in the image and it is compared with the corresponding neighbourhood of pixels.

The square structuring element ‘A’ fits in the object we want to select, the ‘B’ intersects the object and ‘C’ is out of the object.

The zero-one pattern defines the configuration of the structuring element. It’s according to the shape of the object we want to select. The center of the structuring element identifies the pixel being processed.

2. Gaussian Image Processing

Gaussian blur which is also known as gaussian smoothing, is the result of blurring an image by a Gaussian function.

It is used to reduce image noise and reduce details. The visual effect of this blurring technique is similar to looking at an image through the translucent screen. It is sometimes used in computer vision for image enhancement at different scales or as a data augmentation technique in deep learning.

The basic gaussian function looks like:

In practice, it is best to take advantage of the Gaussian blur’s separable property by dividing the process into two passes. In the first pass, a one-dimensional kernel is used to blur the image in only the horizontal or vertical direction. In the second pass, the same one-dimensional kernel is used to blur in the remaining direction. The resulting effect is the same as convolving with a two-dimensional kernel in a single pass. Let’s see an example to understand what gaussian filters do to an image.

If we have a filter which is normally distributed, and when its applied to an image, the results look like this:

Original

Filter

Result

Source

You can see that some of the edges have little less detail. The filter is giving more weight to the pixels at the center than the pixels away from the center. Gaussian filters are low-pass filters i.e. weakens the high frequencies. It is commonly used in edge detection.

3. Fourier Transform in image processing

Fourier transform breaks down an image into sine and cosine components.

It has multiple applications like image reconstruction, image compression, or image filtering.

Since we are talking about images, we will take discrete fourier transform into consideration.

Let’s consider a sinusoid, it comprises of three things:

Magnitude – related to contrast
Spatial frequency – related to brightness
Phase – related to color information

The image in the frequency domain looks like this:

The formula for 2D discrete fourier transform is:

In the above formula, f(x,y) denotes the image.

The inverse fourier transform converts the transform back to image. The formula for 2D inverse discrete fourier transform is:

4. Edge Detection in image processing

Edge detection is an image processing technique for finding the boundaries of objects within images. It works by detecting discontinuities in brightness.

This could be very beneficial in extracting useful information from the image because most of the shape information is enclosed in the edges. Classic edge detection methods work by detecting discontinuities in the brightness.

It can rapidly react if some noise is detected in the image while detecting the variations of grey levels. Edges are defined as the local maxima of the gradient.

The most common edge detection algorithm is sobel edge detection algorithm. Sobel detection operator is made up of 3*3 convolutional kernels. A simple kernel Gx and a 90 degree rotated kernel Gy. Separate measurements are made by applying both the kernel separately to the image.

This is the rendered form of the equation. You can not edit this directly. Right click will give you the option to save the image, and in most browsers you can drag the image onto your desktop or another program.

And,

* denotes the 2D signal processing convolution operation.

Resulting gradient can be calculated as:

$G = sqrt(Gx^{2} + Gy^{2})$

5. Wavelet Image Processing

We saw a Fourier transform but it is only limited to the frequency. Wavelets take both time and frequency into the consideration. This transform is apt for non-stationary signals.

We know that edges are one of the important parts of the image, while applying the traditional filters it’s been noticed that noise gets removed but image gets blurry. The wavelet transform is designed in such a way that we get good frequency resolution for low frequency components. Below is the 2D wavelet transform example:

Image processing using Neural Networks

Neural Networks are multi-layered networks consisting of neurons or nodes. These neurons are the core processing units of the neural network. They are designed to act like human brains. They take in data, train themselves to recognize the patterns in the data and then predict the output.

A basic neural network has three layers:

Input layer
Hidden layer
Output layer

The input layers receive the input, the output layer predicts the output and the hidden layers do most of the calculations. The number of hidden layers can be modified according to the requirements. There should be atleast one hidden layer in a neural network.

The basic working of the neural network is as follows:

Let’s consider an image, each pixel is fed as input to each neuron of the first layer, neurons of one layer are connected to neurons of the next layer through channels.
Each of these channels is assigned a numerical value known as weight.
The inputs are multiplied by the corresponding weights and this weighted sum is then fed as input to the hidden layers.
The output from the hidden layers is passed through an activation function which will determine whether the particular neuron will be activated or not.
The activated neurons transmits data to the next hidden layers. In this manner, data is propagated through the network, this is known as Forward Propagation.
In the output layer, the neuron with the highest value predicts the output. These outputs are the probability values.
The predicted output is compared with the actual output to obtain the error. This information is then transferred back through the network, the process is known as Backpropagation.
Based on this information, the weights are adjusted. This cycle of forward and backward propagation is done several times on multiple inputs until the network predicts the output correctly in most of the cases.
This ends the training process of the neural network. The time taken to train the neural network may get high in some cases.

In the below image, ai’s is the set of inputs, wi’s are the weights, z is the output and g is any activation function.

Operations-in-a-single-neuron — *Operations in a single neuron | Source*

Here are some guidelines to prepare data for image processing.

More data needs to be fed to the model to get the better results.
Image dataset should be of high quality to get more clear information, but to process them you may require deeper neural networks.
In many cases RGB images are converted to grayscale before feeding them into a neural network.

Types of Neural Network

Convolutional Neural Network

A convolutional neural network, ConvNets in short has three layers:

Convolutional Layer (CONV): They are the core building block of CNN, it is responsible for performing convolution operation.The element involved in carrying out the convolution operation in this layer is called the Kernel/Filter (matrix). The kernel makes horizontal and vertical shifts based on the stride rate until the full image is traversed.

Movement-of-the-kernel — *Movement of the kernel | Source*

Pooling Layer (POOL): This layer is responsible for dimensionality reduction. It helps to decrease the computational power required to process the data. There are two types of Pooling: Max Pooling and Average Pooling. Max pooling returns the maximum value from the area covered by the kernel on the image. Average pooling returns the average of all the values in the part of the image covered by the kernel.

Fully Connected Layer (FC): The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures.

CNN is mainly used in extracting features from the image with help of its layers. CNNs are widely used in image classification where each input image is passed through the series of layers to get a probabilistic value between 0 and 1.

Generative Adversarial Networks

Generative models use an unsupervised learning approach (there are images but there are no labels provided).

GANs are composed of two models Generator and Discriminator. Generator learns to make fake images that look realistic so as to fool the discriminator and Discriminator learns to distinguish fake from real images (it tries not to get fooled).

Generator is not allowed to see the real images, so it may produce poor results in the starting phase while the discriminator is allowed to look at real images but they are jumbled with the fake ones produced by the generator which it has to classify as real or fake.

Some noise is fed as input to the generator so that it’s able to produce different examples every single time and not the same type image. Based on the scores predicted by the discriminator, the generator tries to improve its results, after a certain point of time, the generator will be able to produce images that will be harder to distinguish, at that point of time, the user gets satisfied with its results. Discriminator also improves itself as it gets more and more realistic images at each round from the generator.

Popular types of GANs are Deep Convolutional GANs(DCGANs), Conditional GANs(cGANs), StyleGANs, CycleGAN, DiscoGAN, GauGAN and so on.

GANs are great for image generation and manipulation. Some applications of GANs include : Face Aging, Photo Blending, Super Resolution, Photo Inpainting, Clothing Translation.

Image processing tools

1. OpenCV

It stands for Open Source Computer Vision Library. This library consists of around 2000+ optimised algorithms that are useful for computer vision and machine learning. There are several ways you can use opencv in image processing, a few are listed below:

Converting images from one color space to another i.e. like between BGR and HSV, BGR and gray etc.
Performing thresholding on images, like, simple thresholding, adaptive thresholding etc.
Smoothing of images, like, applying custom filters to images and blurring of images.
Performing morphological operations on images.
Building image pyramids.
Extracting foreground from images using GrabCut algorithm.
Image segmentation using watershed algorithm.

Refer to this link for more details.

2. Scikit-image

It is an open-source library used for image preprocessing. It makes use of machine learning with built-in functions and can perform complex operations on images with just a few functions.

It works with numpy arrays and is a fairly simple library even for those who are new to python. Some operations that can be done using scikit image are :

To implement thresholding operations use try_all_threshold() method on the image. It will use seven global thresholding algorithms. This is in the filters module.
To implement edge detection use sobel() method in the filters module. This method requires a 2D grayscale image as an input, so we need to convert the image to grayscale.
To implement gaussian smoothing use gaussian() method in the filters module.
To apply histogram equalization, use exposure module, to apply normal histogram equalization to the original image, use equalize_hist() method and to apply adaptive equalization, use equalize_adapthist() method.
To rotate the image use rotate() function under the transform module.
To rescale the image use rescale() function from the transform module.
To apply morphological operations use binary_erosion() and binary_dilation() function under the morphology module.

3. PIL/pillow

PIL stands for Python Image Library and Pillow is the friendly PIL fork by Alex Clark and Contributors. It’s one of the powerful libraries. It supports a wide range of image formats like PPM, JPEG, TIFF, GIF, PNG, and BMP.

It can help you perform several operations on images like rotating, resizing, cropping, grayscaling etc. Let’s go through some of those operations

To carry out manipulation operations there is a module in this library called Image.

To load an image use the open() method.
To display an image use show() method.
To know the file format use format attribute
To know the size of the image use size attribute
To know about the pixel format use mode attribute.
To save the image file after desired processing, use save() method. Pillow saves the image file in png format.
To resize the image use resize() method that takes two arguments as width and height.
To crop the image, use crop() method that takes one argument as a box tuple that defines position and size of the cropped region.
To rotate the image use rotate() method that takes one argument as an integer or float number representing the degree of rotation.
To flip the image use transform() method that take one argument among the following: Image.FLIP_LEFT_RIGHT, Image.FLIP_TOP_BOTTOM, Image.ROTATE_90, Image.ROTATE_180, Image.ROTATE_270.

4. NumPy

With this library you can also perform simple image techniques, such as flipping images, extracting features, and analyzing them.

Images can be represented by numpy multi-dimensional arrays and so their type is NdArrays. A color image is a numpy array with 3 dimensions. By slicing the multi-dimensional array the RGB channels can be separated.

Below are some of the operations that can be performed using NumPy on the image (image is loaded in a variable named test_img using imread).

To flip the image in a vertical direction, use np.flipud(test_img).
To flip the image in a horizontal direction, use np.fliplr(test_img).
To reverse the image, use test_img[::-1] (the image after storing it as the numpy array is named as <img_name>).
To add filter to the image you can do this:

Example: np.where(test_img > 150, 255, 0), this says that in this picture if you find anything with 150, then replace it with 255, else 0.

You can also display the RGB channels separately. It can be done using this code snippet:

To obtain a red channel, do test_img[:,:,0], to obtain a green channel, do test_img[:,:,1] and to obtain a blue channel, do test_img[:,:,2].

5. Mahotas

It is a computer vision and image processing library and has more than 100 functions. Many of its algorithms are implemented in C++. Mahotas is an independent module in itself i.e. it has minimal dependencies.

Currently, it depends only on C++ compilers for numerical computations, there is no need for NumPy module, the compiler does all its work.

Here are names of some of the remarkable algorithms available in Mahotas:

Watershed (https://mahotas.readthedocs.io/en/latest/distance.html)
Morphological Operations (https://mahotas.readthedocs.io/en/latest/morphology.html)
Hit & miss, thinning. (https://mahotas.readthedocs.io/en/latest/api.html#mahotas.hitmiss)
Colorspace Conversions (https://mahotas.readthedocs.io/en/latest/color.html)
Speeded-Up Robust Features (SURF), a form of local features.
(https://mahotas.readthedocs.io/en/latest/surf.html)
Thresholding. (https://mahotas.readthedocs.io/en/latest/thresholding.html)
Convolution. (https://mahotas.readthedocs.io/en/latest/api.html)
Spline interpolation (https://mahotas.readthedocs.io/en/latest/api.html)
SLIC superpixels. (https://www.pyimagesearch.com/2014/07/28/a-slic-superpixel-tutorial-using-python/)

Let’s look at some of the operations that could be done using Mahotas:

To read an image use imread() method.
To calculate the mean of the image use the mean() method.
Eccentricity of an image measures the shortest length of the paths from a given vertex v to reach any other vertex w of a connected graph. To find the eccentricity of an image, use the eccentricity() method under the features module.
For dilation and erosion on the image use, dilate() and erode() method under morph module.
To find the local maxima of the image use locmax() method.

Summary

In this article, I briefly explained about classical image processing that can be done using Morphological filtering, Gaussian filter, Fourier transform and Wavelet transform.

All these can be performed using various image processing libraries like OpenCV, Mahotas, PIL, scikit-learn.

I also discussed popular neural networks like CNN and GANs that are used for computer vision.

Deep learning is changing the world with its broadway terminologies and advances in the field of image processing. Researchers are coming up with better techniques to fine tune the whole image processing field, so the learning does not stop here. Keep advancing.

Was the article useful?

More about Image Processing in Python: Algorithms, Tools, and Methods You Should Know

Check out our product resources and related articles below:

We are joining OpenAI

Synthetic Data for LLM Training

What are LLM Embeddings: All you Need to Know

Detecting and Fixing ‘Dead Neurons’ in Foundation Models

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs

Transition Hub

Train FM

State of Foundation Model Training Report 2025