Computer vision has gained tremendous traction thanks to state-of-the-art tools and technologies that seem to be released every day. This technology is being applied in various fields, including autonomous vehicles, robotics, and medical image analysis.
Computer vision is a scientific field that involves using technology to understand images and videos. The goal is to automate vision tasks that would otherwise be done by human beings. Some of the common tasks that are being automated include:
- Image classification: this involves classifying the contents of an image
- Object detection: that involves identifying objects in images and drawing bounding boxes around them
- Semantic segmentation: that entails identifying the class of every pixel in an image and drawing a mask around each object
- Human Pose estimation: that identifies the pose of a person in an image of a video
Just to mention a few.
Automation of these tasks has been enabled by the latest advances in deep learning and machine learning. Particularly, convolutional neural networks have played a big role in furthering computer vision. These networks are able to detect and extract important features in an image. These features are then used to identify objects and classify images.
In this article, we are going to take a look at some of the top tools and technologies that have enabled these computer vision applications.
Frameworks and libraries
There are numerous computer vision frameworks and libraries. These tools do different things in the computer vision realm. Each has its advantages and disadvantages. Let’s take a look!
neptune.ai is a platform that can be used for logging and managing computer vision model metadata. You can use it to log:
- Model versions,
- Data versions,
- Model hyperparameters,
- and a lot more.
Neptune is hosted on the cloud, so you don’t need any setup, and you can access your computer vision experiments anytime, anywhere. You can organize the computer vision experiments in one place, and collaborate with your team on them. You can invite your teammates to view and work on any computer vision experiment.
Learn more about logging and managing runs in Neptune.
OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning library. It supports Windows, Linux, Android, and Mac OS.
- Classifying human actions in videos
- Object identification
- Tracking moving objects
- Counting people
- Detecting and recognizing faces
OpenCV can also be used for processing images for computer vision. Some of the supported tasks include:
- Changing the color spaces
- Smoothing images
- Blending images using image pyramids
- Image segmentation using the watershed algorithm
For instance, here is how you can perform a Fourier Transformation on an image.
import numpy as np import cv2 as cv from matplotlib import pyplot as plt img = cv.imread('f.jpg',0) dft = cv.dft(np.float32(img),flags = cv.DFT_COMPLEX_OUTPUT) dft_shift = np.fft.fftshift(dft) magnitude_spectrum = 20*np.log(cv.magnitude(dft_shift[:,:,0],dft_shift[:,:,1])) plt.subplot(121),plt.imshow(img, cmap = 'gray') plt.title('Input Image'), plt.xticks(), plt.yticks() plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray') plt.title('Magnitude Spectrum'), plt.xticks(), plt.yticks()
Some of the networks that can be built using TensorFlow include:
- Convolutional neural networks
- Fully Connected Networks
- Recurrent neural networks
- Long Short Term Memory networks
- Generative Adversarial Networks
TensorFlow is quite popular among developers because it’s easy to use, and gives you multiple tools for performing various actions. Some of the tools in the ecosystem include:
- TensorBoard for visualizing and debugging models
- TensorFlow Playground for tinkering with neural networks on a browser
- TensorFlow Hub that contains numerous trained computer vision models
- TensorFlow Graphics, a library for working with graphics
We have already mentioned that convolutional neural networks(CNN) are majorly used in computer vision tasks. In as much as you can use the normal artificial neural networks, CNNs have proven to perform better.
Let’s look at a code snippet for building a simple neural network. In this snippet you can see that:
- A network of layers is initialized using the `Sequential` function from Keras
- `Conv2D` defines the convolutional layers with a 3 by 3 feature detector. The feature detector reduces the size of the image while maintaining important features
- `MaxPooling2D` does pooling. The features obtained above are passed through a pooling layer. Common types of pooling include max pooling and min pooling. Pooling ensures that the network is able to detect an object in an image irrespective of its location in the image
- The convolutional and pooling layers are repeated. However, you are free to define your own network architecture. You can also use common network architectures
- The `Flatten` layer transforms the output of the pooling layer to a single column that can be passed to a fully connected layer
- `Dense` is that fully connected layer. In this layer, an activation function corresponding to the type of problem is applied
- The last layer is responsible for producing the final output of the network
model = tf.keras.Sequential( [ tf.keras.layers.Conv2D(32, (3,3), activation="relu",input_shape=(32, 32, 3)), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Conv2D(64, (3,3), activation="relu"), tf.keras.layers.MaxPooling2D((2, 2), strides=2), tf.keras.layers.Flatten(), tf.keras.layers.Dense(100, activation="relu"), tf.keras.layers.Dense(10, activation="softmax") ] )
When building a network like the one above, you can use Neptune to track the experiment. This is important so that your experiment is reproducible. It would be very unfortunate to spend days training a network whose performance you can not reproduce later.
The TensorFlow and Neptune integration lets you:
- Log model hyperparameters on every run
- Visualize the model’s learning curves
- See the hardware consumption for every run
- Log model weights
- Log model artifacts, such as images and the model itself
PyTorch is an open-source deep learning library based on the Torch library. It also supports distributed training as well as serving your computer vision models.
Other PyTorch features include:
- Deployment on mobile devices
- Support for ONNX
- C++ front-end
- Supported on major cloud platforms
- A rich ecosystem of tools and libraries
The PyTorch and its ecosystem are also supported by the Neptune machine learning experimentation platform. This PyTorch + Neptune integration lets you:
- Log torch tensors
- Visualize model loss and metrics
- Log torch tensors as images
- Log training code
- Log model weights
Defining a convolutional neural network is quite similar to the TensorFlow definition with a few cosmetic changes. As you can see, you still have to define convolutional and pooling layers as well as indicate activation functions.
import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x net = Net()
If you don’t like writing networks in raw PyTorch code, you can use one of its high-level API libraries. One such package is the `fastai` library.
Fastai is a deep learning library that is built on top of PyTorch.
Some of the features of the library include:
- A computer vision module that is GPU optimized
- A two-way callback system that has access to any part of the data
- Easy to get started with when coming from other libraries
- Supports cyclical learning rates
Fastai metrics can also be logged to Neptune. The integration also lets you monitor and visualize your model’s training process.
Let’s look at an example of using Fastai for image classification. The first step is usually to create a `Learner`. A `Learner` combines a model, a data loader, and a loss function. While defining a `Learner` you can use a pre-trained model and fine-tune it on your dataset. This referred to as transfer learning and will usually result in better model performance compared to training a model from scratch. Pre-trained models are usually hard to beat because they trained on millions of images which are hard to obtain for an individual. Furthermore, a lot of computational resources are required to train a model with millions of images.
from fastai.vision.all import * dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224)) learn = cnn_learner(dls, resnet34, metrics=error_rate) learn.fine_tune(1)
You can then use this `Learner` to run predictions.
Fastai also provides tools for downloading sample data and augmenting images.
Check how you can leverage Neptune to log and track fastai metrics.
Caffe is an open-source deep learning framework developed by Berkeley AI Research (BAIR).
Some of the features of Caffe include:
- Ability to easily switch between different backends
- Fast library
- Can be used for image classification and image segmentation
- Supports CNN, RCNN, and LSTM networks
Like other libraries, the creation of networks in Caffe requires the definition of the network layers, the loss function, and the activation functions.
from caffe import layers as L, params as P def lenet(lmdb, batch_size): # our version of LeNet: a series of linear and simple nonlinear transformations n = caffe.NetSpec() n.data, n.label = L.Data(batch_size=batch_size, backend=P.Data.LMDB, source=lmdb, transform_param=dict(scale=1./255), ntop=2) n.conv1 = L.Convolution(n.data, kernel_size=5, num_output=20, weight_filler=dict(type='xavier')) n.pool1 = L.Pooling(n.conv1, kernel_size=2, stride=2, pool=P.Pooling.MAX) n.conv2 = L.Convolution(n.pool1, kernel_size=5, num_output=50, weight_filler=dict(type='xavier')) n.pool2 = L.Pooling(n.conv2, kernel_size=2, stride=2, pool=P.Pooling.MAX) n.fc1 = L.InnerProduct(n.pool2, num_output=500, weight_filler=dict(type='xavier')) n.relu1 = L.ReLU(n.fc1, in_place=True) n.score = L.InnerProduct(n.relu1, num_output=10, weight_filler=dict(type='xavier')) n.loss = L.SoftmaxWithLoss(n.score, n.label)
OpenVINO ( Open Visual Inference and Neural Network Optimization) can be used to make inferencing with computer vision models faster.
Other features provided by OpenVINO include:
- Supports CNN-based deep learning inference on the edge
- Ships with optimized computer vision functions for OpenCV and OpenCL
Ready-made solutions for particular tasks
Apart from open-source computer vision tools, there are also ready-made solutions that can be used at various stages of building computer vision applications. These solutions are usually hosted and can be used immediately. They also provide free versions that you can start using right away. Let’s take a look at some of them.
Deci is a platform that you can use to optimize your computer vision models and increase their performance.
The platform has the ability to:
- Reduce the model’s latency
- Increase the model’s throughput without affecting its accuracy
- Supports optimization of models from popular frameworks
- Supports deployment on popular CPU and GPU machines
- Benchmarks your model on different hardware hosts and cloud providers
You can use the Deci platform to optimize your machine learning models. The process starts by uploading the model and optimizing it by selecting the required optimization parameters.
The image below shows results obtained after running optimization of YOLO V5 on Deci. Here’s what the image shows:
- The model’s throughput increased by 2.1 times
- The size of the model reduced by 1.4 times
- The model’s throughput improved from 194.1 FPS to 413.2 FPS. It is now 2.1 times faster.
- The latency of the model improved by 2.1 times
Clarifai offers image and video solutions that are accessible via an API, device SDK, and on-premise.
Some of the features of the platform include:
- Visual search
- Image cropper
- Visual classification models
- Search by image
- Video interpolation
- Demographic analysis
- Aerial surveillance
- Data labeling
Supervise.ly provides image annotation tools for computer vision models.
Other features offered on the platform include:
- Video labeling
- Labeling of 3D scenes
- Training and testing of computer vision models
Labelbox is a platform that can be used for labeling data for computer vision projects.
The computer vision items supported are:
- Creating bounding boxes, polygons, lines, and keypoints
- Image segmentation tasks
- Image classification
- Video labeling
Explore more tools
Segments is a computer vision platform for labeling and model training.
The platform supports:
- Instance segmentation
- Semantic segmentation
- Bounding boxes and
- Image classification
Nanonets provides solutions for applied computer vision in various industries.
Some of the solutions provided by the platform include:
- ID card verification
- Application of machine learning and optical character recognition for analysis of various documents
The Sentisight computer vision platform offers the following features:
- Smart image labeling tool
- Classification labels, bounding boxes, and polygons
- Training classification and object detection models
- Image similarity search
- Pre-trained models
Amazon Rekognition can be used for image and video analysis without any deep learning experience.
The platform offers:
- Ability to recognize thousands of labels
- Adding of custom labels
- Content moderation
- Text identification
- Face detection and analysis
- Face search and verification
- Celebrity recognition
Google Cloud Vision APIs
Google Cloud provides vision APIs with the following features:
- Image classification with pre-trained models
- Data labeling
- Deployment of models on edge devices
- Detecting and counting objects
- Detecting faces
- Content moderation
- Celebrity recognition
The Fritz platform can be used for building computer vision models without any machine learning experience.
The platform offers the following features:
- Image generation from a few image samples
- Image labeling
- Image classification and object detection models
- Image segmentation models
- Deployment of trained models on edge devices
- Lens studio lenses based on machine learning
Microsoft Computer Vision API
Microsoft also provides a computer vision platform for analyzing images and video content.
Some the features offered by the service include:
- Text extraction
- Image understanding
- Spatial analysis
- Deployment on the cloud and the edge
IBM Watson Visual Recognition
The IBM Watson Visual Recognition service offers the following services:
- Export of models to CoreML
- Pre-trained classification models
- Custom image classification models
- Custom model training
ShaipCloud is a computer vision platform to label sensitive AI training data – be it an image or a video.
The platform supports:
- Image Annotation / Labeling: Semantic Segmentation, Keypoint Annotation, Bounding Box, 3D Cuboids, Polygon Annotation, Landmark Annotation, Line Segmentation
-> Use Case: Object Detection, Facial Recognition Tracking, Image Classification
- Video Annotation / Labeling: Frame-by-Frame Labeling, Event-based Timestamp Labeling, Keypoint Annotation
-> Use Case: Object / Motion Tracking, Facial Recognition, Autonomous Driving, Security Surveillance, Video/Clip Classification
- Industries Catered: Even though they are industry agnostic, their high throughput services have powered next-generation technologies across various industry verticals like Automotive, Healthcare, Financial Services, Technology, Retail, and Government.
In this article, we have covered various tools that you can use in a computer vision project. We explored several open-source tools, as well as various ready-to-use computer vision platforms.
Your choice of tool or platform will depend on your skillset and your budget. For example, the ready platforms can be used without prior knowledge in deep learning, but they’re not free. The open-source tools are free but will require technical knowledge and experience to use.
Whichever platform or tool you choose, the most fundamental thing is to ensure that it solves your problems.