According to IDC, digital data will skyrocket up to 175 zettabytes, and the huge part of this data is images. Data scientists need to (pre) process these images before feeding them into any machine learning models. They have to do the important (and sometimes dirty) work before the fun part begins.
To process a large amount of data with efficiency and speed without compromising the results data scientists need to use image processing tools for machine learning and deep learning tasks.
In this article, I am going to list out the most useful image processing libraries in Python which are being used heavily in machine learning tasks.
OpenCV is an open-source library that was developed by Intel in the year 2000. It is mostly used in computer vision tasks such as object detection, face detection, face recognition, image segmentation, etc but also contains a lot of useful functions that you may need in ML.
import cv2 as cv import numpy as np import matplotlib.pyplot as plt img = cv.imread('goku.jpeg') gray_image = cv.cvtColor(img, cv.COLOR_BGR2GRAY) fig, ax = plt.subplots(1, 2, figsize=(16, 8)) fig.tight_layout() ax.imshow(cv.cvtColor(img, cv.COLOR_BGR2RGB)) ax.set_title("Original") ax.imshow(cv.cvtColor(gray_image, cv.COLOR_BGR2RGB)) ax.set_title("Grayscale") plt.show()
A colored image consists of 3 color channels where a gray image only consists of 1 Color channel which carries intensity information for each pixel showing the image as black-and-white.
The following code separates each color channel:
import cv2 as cv import numpy as np import matplotlib.pyplot as plt img = cv.imread('goku.jpeg') b, g, r = cv.split(img) fig, ax = plt.subplots(1, 3, figsize=(16, 8)) fig.tight_layout() ax.imshow(cv.cvtColor(r, cv.COLOR_BGR2RGB)) ax.set_title("Red") ax.imshow(cv.cvtColor(g, cv.COLOR_BGR2RGB)) ax.set_title("Green") ax.imshow(cv.cvtColor(b, cv.COLOR_BGR2RGB)) ax.set_title("Blue")
import cv2 as cv import numpy as np import matplotlib.pyplot as plt image = cv.imread("pics/goku.jpeg") h, w = image.shape[:2] half_height, half_width = h//4, w//8 transition_matrix = np.float32([[1, 0, half_width], [0, 1, half_height]]) img_transition = cv.warpAffine(image, transition_matrix, (w, h)) plt.imshow(cv.cvtColor(img_transition, cv.COLOR_BGR2RGB)) plt.title("Translation") plt.show()
Above code translates an image from one coordinate to a different coordinate.
import cv2 as cv import numpy as np import matplotlib.pyplot as plt image = cv.imread("pics/goku.jpeg") h, w = image.shape[:2] rotation_matrix = cv.getRotationMatrix2D((w/2,h/2), -180, 0.5) rotated_image = cv.warpAffine(image, rotation_matrix, (w, h)) plt.imshow(cv.cvtColor(rotated_image, cv.COLOR_BGR2RGB)) plt.title("Rotation") plt.show()
Rotation of an image for the X or Y-axis.
Scaling and resizing
import cv2 as cv import numpy as np import matplotlib.pyplot as plt image = cv.imread("pics/goku.jpeg") fig, ax = plt.subplots(1, 3, figsize=(16, 8)) # image size being 0.15 times of it's original size image_scaled = cv.resize(image, None, fx=0.15, fy=0.15) ax.imshow(cv.cvtColor(image_scaled, cv.COLOR_BGR2RGB)) ax.set_title("Linear Interpolation Scale") # image size being 2 times of it's original size image_scaled_2 = cv.resize(image, None, fx=2, fy=2, interpolation=cv.INTER_CUBIC) ax.imshow(cv.cvtColor(image_scaled_2, cv.COLOR_BGR2RGB)) ax.set_title("Cubic Interpolation Scale") # image size being 0.15 times of it's original size image_scaled_3 = cv.resize(image, (200, 400), interpolation=cv.INTER_AREA) ax.imshow(cv.cvtColor(image_scaled_3, cv.COLOR_BGR2RGB)) ax.set_title("Skewed Interpolation Scale")
Scaling of an image refers to converting an image array into lower or higher dimensions.
These are some of the most basic operations that can be performed with the OpenCV on an image. Apart from this, OpenCV can perform operations such as Image Segmentation, Face Detection, Object Detection, 3-D reconstruction, feature extraction as well.
If you want to have a look at how these pictures were generated using OpenCV then you can check out this GitHub repository.
2. Sci-kit Image
Source: sci-kit image
sci-kit image is a python-based image processing library that has some parts written in Cython (Cython is a programming language which is a superset of Python programming language designed to have performance like C programming language.) to achieve good performance. It includes algorithms for:
- Geometric transformations,
- Color space manipulation,
- Feature detection, and more
You will find it useful for pretty much any computer vision task.
Operation using sci-kit image
In computer vision, contour models describe the boundaries of shapes in an image.
“Active contour models are defined for image segmentation based on the curve flow, curvature, and contour to obtain the exact target region or segment in the image.”
Following code produces the above output:
import numpy as np import matplotlib.pyplot as plt from skimage.color import rgb2gray from skimage import data from skimage.filters import gaussian from skimage.segmentation import active_contour img = data.astronaut() # Data for circular boundary s = np.linspace(0, 2*np.pi, 400) x = 220 + 100*np.cos(s) y = 100 + 100*np.sin(s) init = np.array([x, y]).T # formation of the active contour cntr = active_contour(gaussian(img, 3),init, alpha=0.015, beta=10, gamma=0.001) fig, ax = plt.subplots(1, 2, figsize=(7, 7)) ax.imshow(img, cmap=plt.cm.gray) ax.set_title("Original Image") ax.imshow(img, cmap=plt.cm.gray) # circular boundary ax.plot(init[:, 0], init[:, 1], '--r', lw=3) ax.plot(cntr[:, 0], cntr[:, 1], '-b', lw=3) ax.set_title("Active Contour Image")
Scipy is used for mathematical and scientific computations but can also perform multi-dimensional image processing using the submodule scipy.ndimage. It provides functions to operate on n-dimensional Numpy arrays and at the end of the day images are just that.
Scipy offers the most commonly used image processing operations like:
- Reading Images
- Image Segmentation
- Face Detection
- Feature Extraction and so on.
Blurring an image with scipy
from scipy import misc,ndimage from matplotlib import pyplot as plt face = misc.face() blurred_face = ndimage.gaussian_filter(face, sigma=3) fig, ax = plt.subplots(1, 2, figsize=(16, 8)) ax.imshow(face) ax.set_title("Original Image") ax.set_xticks() ax.set_yticks() ax.imshow(blurred_face) ax.set_title("Blurred Image") ax.set_xticks() ax.set_yticks()
You can find all operations here.
PIL (Python Imaging Library) is an open-source library for image processing tasks that requires python programming language. PIL can perform tasks on an image such as reading, rescaling, saving in different image formats.
PIL can be used for Image archives, Image processing, Image display.
Image enhancement with PIL
For example, let’s enhance the following image by 30% contrast.
from PIL import Image, ImageFilter #Read image im = Image.open('cat_inpainted.png') #Display image im.show() from PIL import ImageEnhance enh = ImageEnhance.Contrast(im) enh.enhance(1.8).show("30% more contrast")
For more information go here.
An image is essentially an array of pixel values where each pixel is represented by 1 (greyscale) or 3 (RGB) values. Therefore, NumPy can easily perform tasks such as image cropping, masking, or manipulation of pixel values.
For example to extract red/green/blue channels from the following image:
We can use numpy and “penalize” each channel one at a time by replacing all the pixel values with zero.
from PIL import Image import numpy as np im = np.array(Image.open('goku.png')) im_R = im.copy() im_R[:, :, (1, 2)] = 0 im_G = im.copy() im_G[:, :, (0, 2)] = 0 im_B = im.copy() im_B[:, :, (0, 1)] = 0 im_RGB = np.concatenate((im_R, im_G, im_B), axis=1) pil_img = Image.fromarray(im_RGB) pil_img.save('goku.jpg')
Mahotas is another image processing and computer vision library that was designed for bioimage informatics. It reads and writes images in NumPy array, and is implemented in C++ with a smooth python interface.
The most popular functions of Mahotas are
Let’s see how Template Matching can be done with Mahotas for finding the wally.
The following code snippet helps in finding the Wally in the crowd.
from pylab import imshow, show import mahotas import mahotas.demos import numpy as np wally = mahotas.demos.load('Wally') wfloat = wally.astype(float) r,g,b = wfloat.transpose((2,0,1)) w = wfloat.mean(2) pattern = np.ones((24,16), float) for i in range(2): pattern[i::4] = -1 v = mahotas.convolve(r-w, pattern) mask = (v == v.max()) mask = mahotas.dilate(mask, np.ones((48,24))) np.subtract(wally, .8*wally * ~mask[:,:,None], out=wally, casting='unsafe') imshow(wally) show()
ITK or Insight Segmentation and Registration Toolkit is an open-source platform that is widely used for Image Segmentation and Image Registration (a process that overlays two or more images).
ITK uses the CMake build environment and the library is implemented in C++ which is wrapped for Python.
You can check this Jupyter Notebook for learning and research purposes.
Pgmagick is a GraphicsMagick binding for Python that provides utilities to perform on images such as resizing, rotation, sharpening, gradient images, drawing text, etc.
Blurring an image
from pgmagick.api import Image img = Image('leena.jpeg') # blur image img.blur(10, 5)
Scaling of an image
from pgmagick.api import Image img = Image('leena.png') # scaling image img.scale((150, 100), 'leena_scaled')
For more info, you can check the curated list of Jupyter Notebooks here.
We have covered the top 8 image processing libraries for machine learning. Hopefully, you now have an idea of which one of those will work best for your project. Best of luck. 🙂