MLOps Blog

Top Tools to Run a Computer Vision Project

Derrick Mwiti

7 min

18th August, 2023

Computer Vision ML Tools

Computer vision has gained tremendous traction thanks to state-of-the-art tools and technologies that seem to be released every day. This technology is being applied in various fields, including autonomous vehicles, robotics, and medical image analysis.

Computer vision is a scientific field that involves using technology to understand images and videos. The goal is to automate vision tasks that would otherwise be done by human beings. Some of the common tasks that are being automated include:

Image classification: this involves classifying the contents of an image
Object detection: that involves identifying objects in images and drawing bounding boxes around them
Semantic segmentation: that entails identifying the class of every pixel in an image and drawing a mask around each object
Human Pose estimation: that identifies the pose of a person in an image of a video

Just to mention a few.

Automation of these tasks has been enabled by the latest advances in deep learning and machine learning. Particularly, convolutional neural networks have played a big role in furthering computer vision. These networks are able to detect and extract important features in an image. These features are then used to identify objects and classify images.

In this article, we are going to take a look at some of the top tools and technologies that have enabled these computer vision applications.

Frameworks and libraries

There are numerous computer vision frameworks and libraries. These tools do different things in the computer vision realm. Each has its advantages and disadvantages. Let’s take a look!

neptune.ai

neptune.ai is a platform that can be used for logging and managing computer vision model metadata. You can use it to log:

Model versions,
Data versions,
Model hyperparameters,
Charts,
and a lot more.

Neptune is hosted on the cloud, so you don’t need any setup, and you can access your computer vision experiments anytime, anywhere. You can organize the computer vision experiments in one place, and collaborate with your team on them. You can invite your teammates to view and work on any computer vision experiment.

Learn more about logging and managing runs in Neptune.

OpenCV

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning library. It supports Windows, Linux, Android, and Mac OS.

The library also provides interfaces for CUDA and OpenCL. OpenCV can also be used in C++, Java, MATLAB, and Python. Some of the tasks that can be accomplished using OpenCV include:

Classifying human actions in videos
Object identification
Tracking moving objects
Counting people
Detecting and recognizing faces

OpenCV can also be used for processing images for computer vision. Some of the supported tasks include:

Changing the color spaces
Smoothing images
Blending images using image pyramids
Image segmentation using the watershed algorithm

For instance, here is how you can perform a Fourier Transformation on an image.

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('f.jpg',0)
dft = cv.dft(np.float32(img),flags = cv.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
magnitude_spectrum = 20*np.log(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]))
plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])

Check also

Best Image Processing Tools Used in Machine Learning

TensorFlow

TensorFlow is an open-source deep-learning library developed at Google. It makes the building of deep learning models easy and intuitive through the use of Keras as its high-level API.

Some of the networks that can be built using TensorFlow include:

Convolutional neural networks
Fully Connected Networks
Recurrent neural networks
Long Short Term Memory networks
Generative Adversarial Networks

TensorFlow is quite popular among developers because it’s easy to use, and gives you multiple tools for performing various actions. Some of the tools in the ecosystem include:

TensorBoard for visualizing and debugging models
TensorFlow Playground for tinkering with neural networks on a browser
TensorFlow Hub that contains numerous trained computer vision models
TensorFlow Graphics, a library for working with graphics

We have already mentioned that convolutional neural networks(CNN) are majorly used in computer vision tasks. In as much as you can use the normal artificial neural networks, CNNs have proven to perform better.

Let’s look at a code snippet for building a simple neural network. In this snippet you can see that:

A network of layers is initialized using the `Sequential` function from Keras
`Conv2D` defines the convolutional layers with a 3 by 3 feature detector. The feature detector reduces the size of the image while maintaining important features
`MaxPooling2D` does pooling. The features obtained above are passed through a pooling layer. Common types of pooling include max pooling and min pooling. Pooling ensures that the network is able to detect an object in an image irrespective of its location in the image
The convolutional and pooling layers are repeated. However, you are free to define your own network architecture. You can also use common network architectures
The `Flatten` layer transforms the output of the pooling layer to a single column that can be passed to a fully connected layer
`Dense` is that fully connected layer. In this layer, an activation function corresponding to the type of problem is applied
The last layer is responsible for producing the final output of the network

model = tf.keras.Sequential(
    [
    tf.keras.layers.Conv2D(32, (3,3), activation="relu",input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D((2, 2), strides=2),

    tf.keras.layers.Conv2D(64, (3,3), activation="relu"),
    tf.keras.layers.MaxPooling2D((2, 2), strides=2),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
]
)

When building a network like the one above, you can use Neptune to track the experiment. This is important so that your experiment is reproducible. It would be very unfortunate to spend days training a network whose performance you can not reproduce later.

The TensorFlow and Neptune integration lets you:

Log model hyperparameters on every run
Visualize the model’s learning curves
See the hardware consumption for every run
Log model weights
Log model artifacts, such as images and the model itself

PyTorch

PyTorch is an open-source deep learning library based on the Torch library. It also supports distributed training as well as serving your computer vision models.

Other PyTorch features include:

Deployment on mobile devices
Support for ONNX
C++ front-end
Supported on major cloud platforms
A rich ecosystem of tools and libraries

The PyTorch and its ecosystem are also supported by the Neptune machine learning experimentation platform. This PyTorch + Neptune integration lets you:

Log torch tensors
Visualize model loss and metrics
Log torch tensors as images
Log training code
Log model weights

Defining a convolutional neural network is quite similar to the TensorFlow definition with a few cosmetic changes. As you can see, you still have to define convolutional and pooling layers as well as indicate activation functions.

import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net = Net()

Learn more

How to Keep Track of Experiments in PyTorch Using neptune.ai
How to Keep Track of PyTorch Lightning Experiments with neptune.ai

If you don’t like writing networks in raw PyTorch code, you can use one of its high-level API libraries. One such package is the `fastai` library.

Fastai

Fastai is a deep learning library that is built on top of PyTorch.

Some of the features of the library include:

A computer vision module that is GPU optimized
A two-way callback system that has access to any part of the data
Easy to get started with when coming from other libraries
Supports cyclical learning rates

Fastai metrics can also be logged to Neptune. The integration also lets you monitor and visualize your model’s training process.

Let’s look at an example of using Fastai for image classification. The first step is usually to create a `Learner`. A `Learner` combines a model, a data loader, and a loss function. While defining a `Learner` you can use a pre-trained model and fine-tune it on your dataset. This referred to as transfer learning and will usually result in better model performance compared to training a model from scratch. Pre-trained models are usually hard to beat because they trained on millions of images which are hard to obtain for an individual. Furthermore, a lot of computational resources are required to train a model with millions of images.

from fastai.vision.all import *
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

You can then use this `Learner` to run predictions.

learn.predict(files[0])

Fastai also provides tools for downloading sample data and augmenting images.

Check how you can leverage Neptune to log and track fastai metrics.

Caffe

Caffe is an open-source deep learning framework developed by Berkeley AI Research (BAIR).

Some of the features of Caffe include:

Ability to easily switch between different backends
Fast library
Can be used for image classification and image segmentation
Supports CNN, RCNN, and LSTM networks

Like other libraries, the creation of networks in Caffe requires the definition of the network layers, the loss function, and the activation functions.

from caffe import layers as L, params as P

def lenet(lmdb, batch_size):
    # our version of LeNet: a series of linear and simple nonlinear transformations
    n = caffe.NetSpec()

    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
                             transform_param=dict(scale=1./255), ntop=2)

    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.fc1 =   L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
    n.relu1 = L.ReLU(n.fc1, in_place=True)
    n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
    n.loss =  L.SoftmaxWithLoss(n.score, n.label)

OpenVINO

OpenVINO ( Open Visual Inference and Neural Network Optimization) can be used to make inferencing with computer vision models faster.

Other features provided by OpenVINO include:

Supports CNN-based deep learning inference on the edge
Ships with optimized computer vision functions for OpenCV and OpenCL

Ready-made solutions for particular tasks

Apart from open-source computer vision tools, there are also ready-made solutions that can be used at various stages of building computer vision applications. These solutions are usually hosted and can be used immediately. They also provide free versions that you can start using right away. Let’s take a look at some of them.

Deci

Deci is a platform that you can use to optimize your computer vision models and increase their performance.

The platform has the ability to:

Reduce the model’s latency
Increase the model’s throughput without affecting its accuracy
Supports optimization of models from popular frameworks
Supports deployment on popular CPU and GPU machines
Benchmarks your model on different hardware hosts and cloud providers

You can use the Deci platform to optimize your machine learning models. The process starts by uploading the model and optimizing it by selecting the required optimization parameters.

The image below shows results obtained after running optimization of YOLO V5 on Deci. Here’s what the image shows:

The model’s throughput increased by 2.1 times
The size of the model reduced by 1.4 times
The model’s throughput improved from 194.1 FPS to 413.2 FPS. It is now 2.1 times faster.
The latency of the model improved by 2.1 times

Clarifai

Clarifai offers image and video solutions that are accessible via an API, device SDK, and on-premise.

Some of the features of the platform include:

Visual search
Image cropper
Visual classification models
Search by image
Video interpolation
Demographic analysis
Aerial surveillance
Data labeling

Supervisely

Supervise.ly provides image annotation tools for computer vision models.

Other features offered on the platform include:

Video labeling
Labeling of 3D scenes
Training and testing of computer vision models

Labelbox

Labelbox is a platform that can be used for labeling data for computer vision projects.

The computer vision items supported are:

Creating bounding boxes, polygons, lines, and keypoints
Image segmentation tasks
Image classification
Video labeling

Explore more tools

Data Labeling Software: Best Tools for Data Labeling in 2021

Segments

Segments is a computer vision platform for labeling and model training.

The platform supports:

Instance segmentation
Semantic segmentation
Polygons
Bounding boxes and
Image classification

Nanonets

Nanonets provides solutions for applied computer vision in various industries.

Some of the solutions provided by the platform include:

ID card verification
Application of machine learning and optical character recognition for analysis of various documents

Sentisight

The Sentisight computer vision platform offers the following features:

Smart image labeling tool
Classification labels, bounding boxes, and polygons
Training classification and object detection models
Image similarity search
Pre-trained models

Amazon Rekognition

Amazon Rekognition can be used for image and video analysis without any deep learning experience.

The platform offers:

Ability to recognize thousands of labels
Adding of custom labels
Content moderation
Text identification
Face detection and analysis
Face search and verification
Celebrity recognition

Google Cloud Vision APIs

Google Cloud provides vision APIs with the following features:

Image classification with pre-trained models
Data labeling
Deployment of models on edge devices
Detecting and counting objects
Detecting faces
Content moderation
Celebrity recognition

Computer vision tools - Google cloud api

Fritz

The Fritz platform can be used for building computer vision models without any machine learning experience.

The platform offers the following features:

Image generation from a few image samples
Image labeling
Image classification and object detection models
Image segmentation models
Deployment of trained models on edge devices
Lens studio lenses based on machine learning

Microsoft Computer Vision API

Microsoft also provides a computer vision platform for analyzing images and video content.

Some the features offered by the service include:

Text extraction
Image understanding
Spatial analysis
Deployment on the cloud and the edge

IBM Watson Visual Recognition

The IBM Watson Visual Recognition service offers the following services:

Export of models to CoreML
Pre-trained classification models
Custom image classification models
Custom model training

ShaipCloud

ShaipCloud is a computer vision platform to label sensitive AI training data – be it an image or a video.

The platform supports:

Image Annotation / Labeling: Semantic Segmentation, Keypoint Annotation, Bounding Box, 3D Cuboids, Polygon Annotation, Landmark Annotation, Line Segmentation
-> Use Case: Object Detection, Facial Recognition Tracking, Image Classification
Video Annotation / Labeling: Frame-by-Frame Labeling, Event-based Timestamp Labeling, Keypoint Annotation
-> Use Case: Object / Motion Tracking, Facial Recognition, Autonomous Driving, Security Surveillance, Video/Clip Classification
Industries Catered: Even though they are industry agnostic, their high throughput services have powered next-generation technologies across various industry verticals like Automotive, Healthcare, Financial Services, Technology, Retail, and Government.

Final thoughts

In this article, we have covered various tools that you can use in a computer vision project. We explored several open-source tools, as well as various ready-to-use computer vision platforms.

Your choice of tool or platform will depend on your skillset and your budget. For example, the ready platforms can be used without prior knowledge in deep learning, but they’re not free. The open-source tools are free but will require technical knowledge and experience to use.

Whichever platform or tool you choose, the most fundamental thing is to ensure that it solves your problems.