MLOps Blog

Top Tools to Run a Computer Vision Project

7 min
18th August, 2023

Computer vision has gained tremendous traction thanks to state-of-the-art tools and technologies that seem to be released every day. This technology is being applied in various fields, including autonomous vehicles, robotics, and medical image analysis. 

Computer vision is a scientific field that involves using technology to understand images and videos. The goal is to automate vision tasks that would otherwise be done by human beings. Some of the common tasks that are being automated include:

  • Image classification: this involves classifying the contents of an image 
  • Object detection: that involves identifying objects in images and drawing bounding boxes around them 
  • Semantic segmentation: that entails identifying the class of every pixel in an image and drawing a mask around each object
  • Human Pose estimation: that identifies the pose of a person in an image of a video 

Just to mention a few. 

Automation of these tasks has been enabled by the latest advances in deep learning and machine learning. Particularly, convolutional neural networks have played a big role in furthering computer vision. These networks are able to detect and extract important features in an image. These features are then used to identify objects and classify images. 

In this article, we are going to take a look at some of the top tools and technologies that have enabled these computer vision applications. 

Frameworks and libraries

There are numerous computer vision frameworks and libraries. These tools do different things in the computer vision realm. Each has its advantages and disadvantages. Let’s take a look!  

neptune.ai

neptune.ai is a platform that can be used for logging and managing computer vision model metadata. You can use it to log:

  • Model versions,
  • Data versions,
  • Model hyperparameters,
  • Charts,
  • and a lot more.
neptune.ai homepage

Neptune is hosted on the cloud, so you don’t need any setup, and you can access your computer vision experiments anytime, anywhere. You can organize the computer vision experiments in one place, and collaborate with your team on them. You can invite your teammates to view and work on any computer vision experiment.

Learn more about logging and managing runs in Neptune.

OpenCV

OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning library. It supports Windows, Linux, Android, and Mac OS. 

Computer vision tools - opencv

The library also provides interfaces for CUDA and OpenCL. OpenCV can also be used in C++, Java, MATLAB, and Python. Some of the tasks that can be accomplished using OpenCV include: 

  • Classifying human actions in videos
  • Object identification 
  • Tracking moving objects 
  • Counting people 
  • Detecting and recognizing faces 

OpenCV can also be used for processing images for computer vision. Some of the supported tasks include: 

  • Changing the color spaces
  • Smoothing images 
  • Blending images using image pyramids 
  • Image segmentation using the watershed algorithm 

For instance, here is how you can perform a Fourier Transformation on an image.

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('f.jpg',0)
dft = cv.dft(np.float32(img),flags = cv.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)
magnitude_spectrum = 20*np.log(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]))
plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])
Computer vision tools - opencv

Check also

Best Image Processing Tools Used in Machine Learning

TensorFlow

TensorFlow is an open-source deep-learning library developed at Google. It makes the building of deep learning models easy and intuitive through the use of Keras as its high-level API. 

Computer vision tools - tensorflow

Some of the networks that can be built using TensorFlow include:

  • Convolutional neural networks 
  • Fully Connected Networks
  • Recurrent neural networks 
  • Long Short Term Memory networks 
  • Generative Adversarial Networks

TensorFlow is quite popular among developers because it’s easy to use, and gives you multiple tools for performing various actions. Some of the tools in the ecosystem include: 

  • TensorBoard for visualizing and debugging models 
  • TensorFlow Playground for tinkering with neural networks on a browser 
  • TensorFlow Hub that contains numerous trained computer vision models
  • TensorFlow Graphics, a library for working with graphics 

We have already mentioned that convolutional neural networks(CNN) are majorly used in computer vision tasks. In as much as you can use the normal artificial neural networks, CNNs have proven to perform better. 

Let’s look at a code snippet for building a simple neural network. In this snippet you can see that: 

  • A network of layers is initialized using the `Sequential` function from Keras 
  • `Conv2D` defines the convolutional layers with a 3 by 3 feature detector. The feature detector reduces the size of the image while maintaining important features
  • `MaxPooling2D` does pooling. The features obtained above are passed through a pooling layer. Common types of pooling include max pooling and min pooling. Pooling ensures that the network is able to detect an object in an image irrespective of its location in the image
  • The convolutional and pooling layers are repeated. However, you are free to define your own network architecture. You can also use common network architectures 
  • The `Flatten` layer transforms the output of the pooling layer to a single column that can be passed to a fully connected layer 
  • `Dense` is that fully connected layer. In this layer, an activation function corresponding to the type of problem is applied
  • The last layer is responsible for producing the final output of the network
model = tf.keras.Sequential(
    [
    tf.keras.layers.Conv2D(32, (3,3), activation="relu",input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D((2, 2), strides=2),

    tf.keras.layers.Conv2D(64, (3,3), activation="relu"),
    tf.keras.layers.MaxPooling2D((2, 2), strides=2),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
]
)

When building a network like the one above, you can use Neptune to track the experiment. This is important so that your experiment is reproducible. It would be very unfortunate to spend days training a network whose performance you can not reproduce later. 

The TensorFlow and Neptune integration lets you:

  • Log model hyperparameters on every run 
  • Visualize the model’s learning curves 
  • See the hardware consumption for every run 
  • Log model weights 
  • Log model artifacts, such as images and the model itself
Tensorflow keras netune integration

PyTorch

PyTorch is an open-source deep learning library based on the Torch library. It also supports distributed training as well as serving your computer vision models. 

Computer vision tools - pytorch

Other PyTorch features include: 

  • Deployment on mobile devices 
  • Support for ONNX
  • C++ front-end 
  • Supported on major cloud platforms 
  • A rich ecosystem of tools and libraries 

The PyTorch and its ecosystem are also supported by the Neptune machine learning experimentation platform. This PyTorch + Neptune integration lets you: 

  • Log torch tensors 
  • Visualize model loss and metrics 
  • Log torch tensors as images 
  • Log training code 
  • Log model weights 

Defining a convolutional neural network is quite similar to the TensorFlow definition with a few cosmetic changes. As you can see, you still have to define convolutional and pooling layers as well as indicate activation functions. 

import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net = Net()

Learn more

How to Keep Track of Experiments in PyTorch Using neptune.ai
How to Keep Track of PyTorch Lightning Experiments with neptune.ai

If you don’t like writing networks in raw PyTorch code, you can use one of its high-level API libraries. One such package is the `fastai` library. 

Fastai

Fastai is a deep learning library that is built on top of PyTorch. 

Computer vision tools - fastai

Some of the features of the library include: 

  • A computer vision module that is GPU optimized 
  • A two-way callback system that has access to any part of the data
  • Easy to get started with when coming from other libraries
  • Supports cyclical learning rates

Fastai metrics can also be logged to Neptune. The integration also lets you monitor and visualize your model’s training process. 

Let’s look at an example of using Fastai for image classification. The first step is usually to create a `Learner`. A `Learner` combines a model, a data loader, and a loss function. While defining a `Learner` you can use a pre-trained model and fine-tune it on your dataset. This referred to as transfer learning and will usually result in better model performance compared to training a model from scratch. Pre-trained models are usually hard to beat because they trained on millions of images which are hard to obtain for an individual. Furthermore, a lot of computational resources are required to train a model with millions of images.

from fastai.vision.all import *
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)

You can then use this `Learner` to run predictions. 

learn.predict(files[0])

Fastai also provides tools for downloading sample data and augmenting images.

Check how you can leverage Neptune to log and track fastai metrics.

Caffe

Caffe is an open-source deep learning framework developed by Berkeley AI Research (BAIR).

Computer vision tools - caffe

Some of the features of Caffe include: 

  • Ability to easily switch between different backends
  • Fast library 
  • Can be used for image classification and image segmentation 
  • Supports CNN, RCNN, and LSTM networks 

Like other libraries, the creation of networks in Caffe requires the definition of the network layers, the loss function, and the activation functions.

from caffe import layers as L, params as P

def lenet(lmdb, batch_size):
    # our version of LeNet: a series of linear and simple nonlinear transformations
    n = caffe.NetSpec()

    n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb,
                             transform_param=dict(scale=1./255), ntop=2)

    n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX)
    n.fc1 =   L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier'))
    n.relu1 = L.ReLU(n.fc1, in_place=True)
    n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier'))
    n.loss =  L.SoftmaxWithLoss(n.score, n.label)

OpenVINO

OpenVINO ( Open Visual Inference and Neural Network Optimization) can be used to make inferencing with computer vision models faster. 

Other features provided by OpenVINO include: 

  • Supports CNN-based deep learning inference on the edge
  • Ships with optimized computer vision functions for OpenCV and OpenCL

Ready-made solutions for particular tasks

Apart from open-source computer vision tools, there are also ready-made solutions that can be used at various stages of building computer vision applications. These solutions are usually hosted and can be used immediately. They also provide free versions that you can start using right away. Let’s take a look at some of them.  

Deci

Deci is a platform that you can use to optimize your computer vision models and increase their performance. 

Computer vision tools - deci

The platform has the ability to:

  • Reduce the model’s latency 
  • Increase the model’s throughput without affecting its accuracy 
  • Supports optimization of models from popular frameworks 
  • Supports deployment on popular CPU and GPU machines
  • Benchmarks your model on different hardware hosts and cloud providers

You can use the Deci platform to optimize your machine learning models. The process starts by uploading the model and optimizing it by selecting the required optimization parameters. 

Computer vision tools - deci

The image below shows results obtained after running optimization of YOLO V5 on Deci. Here’s what the image shows:

  • The model’s throughput increased by 2.1 times 
  • The size of the model reduced by 1.4 times 
  • The model’s throughput improved from 194.1 FPS to 413.2 FPS. It is now 2.1 times faster.
  • The latency of the model improved by 2.1 times
Computer vision tools - deci

Clarifai

Clarifai offers image and video solutions that are accessible via an API, device SDK, and on-premise. 

Computer vision tools - clarifai

Some of the features of the platform include:

  • Visual search 
  • Image cropper 
  • Visual classification models 
  • Search by image 
  • Video interpolation
  • Demographic analysis 
  • Aerial surveillance
  • Data labeling 

Supervisely

Supervise.ly provides image annotation tools for computer vision models.

Computer vision tools - supervisely

 Other features offered on the platform include:

  • Video labeling 
  • Labeling of 3D scenes 
  • Training and testing of computer vision models

Labelbox

Labelbox is a platform that can be used for labeling data for computer vision projects.

Computer vision tools - labelbox

The computer vision items supported are:

  • Creating bounding boxes, polygons, lines, and keypoints 
  • Image segmentation tasks 
  • Image classification
  • Video labeling

Explore more tools

Data Labeling Software: Best Tools for Data Labeling in 2021

Segments

Segments is a computer vision platform for labeling and model training. 

Computer vision tools - segments

The platform supports:

  • Instance segmentation 
  • Semantic segmentation 
  • Polygons
  • Bounding boxes and 
  • Image classification 

Nanonets

Nanonets provides solutions for applied computer vision in various industries. 

Computer vision tools - nanonets

Some of the solutions provided by the platform include: 

  • ID card verification 
  • Application of machine learning and optical character recognition for analysis of various documents

Sentisight

The Sentisight computer vision platform offers the following features: 

  • Smart image labeling tool 
  • Classification labels, bounding boxes, and polygons
  • Training classification and object detection models 
  • Image similarity search 
  • Pre-trained models 

Amazon Rekognition

Amazon Rekognition can be used for image and video analysis without any deep learning experience.

Computer vision tools - rekognition

 The platform offers:

  • Ability to recognize thousands of labels 
  • Adding of custom labels 
  • Content moderation 
  • Text identification 
  • Face detection and analysis 
  • Face search and verification 
  • Celebrity recognition  

Google Cloud Vision APIs

Google Cloud provides vision APIs with the following features: 

  • Image classification with pre-trained models 
  • Data labeling 
  • Deployment of models on edge devices 
  • Detecting and counting objects
  • Detecting faces
  • Content moderation 
  • Celebrity recognition 
Computer vision tools - Google cloud api

Fritz 

The Fritz platform can be used for building computer vision models without any machine learning experience. 

Computer vision tools - fritz

The platform offers the following features: 

  • Image generation from a few image samples 
  • Image labeling 
  • Image classification and object detection models 
  • Image segmentation models 
  • Deployment of trained models on edge devices 
  • Lens studio lenses based on machine learning 

Microsoft Computer Vision API

Microsoft also provides a computer vision platform for analyzing images and video content.

Computer vision tools - microsoft

Some the features offered by the service include: 

  • Text extraction 
  • Image understanding 
  • Spatial analysis 
  • Deployment on the cloud and the edge 

IBM Watson Visual Recognition

The IBM Watson Visual Recognition service offers the following services: 

  • Export of models to CoreML
  • Pre-trained classification models
  • Custom image classification models 
  • Custom model training 
Computer vision tools - ibm watson

ShaipCloud

ShaipCloud is a computer vision platform to label sensitive AI training data – be it an image or a video.

Shaipcloud - Computer Vision tools

The platform supports:

  • Image Annotation / Labeling: Semantic Segmentation, Keypoint Annotation, Bounding Box, 3D Cuboids, Polygon Annotation, Landmark Annotation, Line Segmentation
    -> Use Case: Object Detection, Facial Recognition Tracking, Image Classification
  • Video Annotation / Labeling: Frame-by-Frame Labeling, Event-based Timestamp Labeling, Keypoint Annotation
    -> Use Case: Object / Motion Tracking, Facial Recognition, Autonomous Driving, Security Surveillance, Video/Clip Classification
  • Industries Catered: Even though they are industry agnostic, their high throughput services have powered next-generation technologies across various industry verticals like Automotive, Healthcare, Financial Services, Technology, Retail, and Government.

Final thoughts

In this article, we have covered various tools that you can use in a computer vision project. We explored several open-source tools, as well as various ready-to-use computer vision platforms. 

Your choice of tool or platform will depend on your skillset and your budget. For example, the ready platforms can be used without prior knowledge in deep learning, but they’re not free. The open-source tools are free but will require technical knowledge and experience to use. 

Whichever platform or tool you choose, the most fundamental thing is to ensure that it solves your problems.

Resources

Was the article useful?

Thank you for your feedback!
What topics would you like to see for your next read
Let us know what should be improved

    Thanks! Your suggestions have been forwarded to our editors