MLOps Blog

Best Image Processing Tools Used in Machine Learning

5 min
Vitaliy Lyalin
21st April, 2023

Image processing is a very useful technology and the demand from the industry seems to be growing every year. Historically, image processing that uses machine learning appeared in the 1960s as an attempt to simulate the human vision system and automate the image analysis process. As the technology developed and improved, solutions for specific tasks began to appear.

The rapid acceleration of computer vision in 2010, thanks to deep learning and the emergence of open source projects and large image databases only increased the need for image processing tools.       

Currently, many useful libraries and projects have been created that can help you solve image processing problems with machine learning or simply improve the processing pipelines in the computer vision projects where you use ML.

In this article, we give you a list of tools that will improve your computer vision projects divided into:

  • frameworks and libraries
  • datasets 
  • ready-made solutions for particular tasks

Let’s dive in!

Frameworks and libraries

In theory, you could build your image processing application from scratch, just you and your computer. But in reality, it’s way better to stand on the shoulders of giants and use what other people have built and extend or adjust it where needed.

This is where libraries and frameworks come in and in image processing, where creating efficient implementations is often a difficult task this is even more true. 

So, let me give you my list of libraries and frameworks that you can use in your image processing projects:     

OpenCV

Open-source library of computer vision and image processing algorithms.

Designed and well optimized for real-time computer vision applications.

Designed to develop open infrastructure.

Functionality:

  • Basic data structures
  • Image processing algorithms
  • Basic algorithms for computer vision
  • Input and output of images and videos
  • Human face detection
  • Search for stereo matches (FullHD)
  • Optical flow
  • Continuous integration system
  • CUDA-optimized architecture
  • Android version
  • Java API
  • Built-in performance testing system
  • Cross-platform

TensorFlow

Open-source software library for machine learning.

Created to solve problems of constructing and training a neural network with the aim of automatically finding and classifying images, reaching the quality of human perception.

Functionality:

  • Work on multiple parallel processors
  • Calculation through multidimensional data arrays – tensors
  • Optimization for tensor processors
  • Immediate model iteration
  • Simple debugging
  • Own logging system
  • Interactive log visualizer

PyTorch

Open-source machine learning platform.

Designed to speed up the development cycle from research prototyping to industrial development.

Functionality:

  • Easy transition to production
  • Distributed learning and performance optimization
  • Rich ecosystem of tools and libraries
  • Good support for major cloud platforms
  • Optimization and automatic differentiation modules

Learn more

Read how you can keep track of model training metadata with TensorFlow + Neptune integration or PyTorch + Neptune integration.

Caffe

A deep learning framework focused on solving the problem of image classification and segmentation.

Functionality:

  • Computation using blobs – multidimensional data arrays used in parallel computing
  • Model definition and configuration optimization, no hard coding
  • Easy switching between CPU and GPU
  • High speed of work

EmguCV

Cross platform .Net addon for OpenCV for image processing.

Functionality:

  • Working with .NET compatible languages – C #, VB, VC ++, IronPython, etc.
  • Compatible with Visual Studio, Xamarin Studio and Unity
  • Can run on Windows, Linux, Mac OS, iOS, and Android

VXL

A collection of open-source C ++ libraries.

Functionality:

  • Load, save, and modify images in many common file formats, including very large images
  • Geometry for points, curves and other elementary objects in 1, 2 or 3 dimensions
  • Camera geometry
  • Restoring structure from movement
  • Designing a graphical user interface
  • Topology
  • 3D images

GDAL

Library for reading and writing raster and vector geospatial data formats.

Functionality:

  • Getting information about raster data
  • Convert to various formats
  • Data re-projection
  • Creation of mosaics from rasters
  • Creation of shapefiles with raster tile index

MIScnn

Framework for 2D/3D Medical Image Segmentation.

Functionality:

  • Creation of segmentation pipelines
  • Preliminary processing
  • Input Output
  • Data increase
  • Patch analysis
  • Automatic assessment
  • Cross validation

Tracking

JavaScript library for computer vision.

Functionality:

  • Color tracking
  • Face recognition
  • Using modern HTML5 specifications
  • Lightweight kernel (~ 7 KB)

WebGazer

Library for eye tracking.

Uses a webcam to determine the location of visitors’ gaze on the page in real-time (where the person is looking).

Functionality:

  • Self-calibration of the model, which observes the interaction of Internet visitors with a web page, and trains the display between eye functions and position on the screen
  • Real time look prediction in most modern browsers
  • Easy integration with just a few lines of JavaScript
  • Ability to predict multiple views
  • Work in the browser on the client side, without transferring data to the server

Marvin

A framework for working with video and images.

Functionality:

  • Capture video frames
  • Frame processing for video filtering
  • Multi-threaded image processing
  • Support for plugin integration via GUI
  • Feature extraction from image components
  • Generation of fractals
  • Object tracking
  • Motion Detection

Kornia

Library for computer vision in PyTorch.

Functionality:

  • Image conversion
  • Epipolar geometry
  • Depth estimation
  • Low-level image processing (such as filtering and edge detection directly on tensors)
  • Color correction
  • Feature recognition
  • Image filtering
  • Border recognition

Datasets

You cannot build machine learning models without the data. This is especially important in image processing applications where adding more labeled data to your training dataset usually gets you bigger improvements than state-of-the-art network architectures or training methods.

With that in mind, let me give you a list of image datasets that you can use in your projects: 

Diversity in Faces

A dataset designed to reduce the bias of algorithms.

A million labeled images of faces of people of different nationalities, ages and genders, as well as other indicators – head size, face contrast, nose length, forehead height, face proportions, etc. and their relationships to each other.

FaceForencis

Dataset for recognizing fake photos and videos.

A set of images (over half a million) created using the Face2Face, FaceSwap and DeepFakes methods.

1000 videos with faces made using each of the falsification methods.

YouTube-8M Segments

Dataset of Youtube videos, with marked up content in dynamics.

Approximately 237 thousand layouts and 1000 categories.

SketchTransfer

Dataset for training neural networks to generalize

The data consists of real-world tagged images and unlabeled sketches.

DroneVehicle

Dataset for counting objects in drone images.

15,532 RGB drone shots, there is an infrared shot for each image.

Object marking is available for both RGB and infrared images.

The dataset contains directional object boundaries and object classes.

In total, 441,642 objects were marked in the dataset for 31,064 images.

Waymo Open Dataset

Dataset for training autopilot vehicles.

Includes videos of driving with marked objects.

3,000 driving videos totaling 16.7 hours, 600,000 frames, about 25 million 3D object boundaries and 22 million 2D object boundaries.

To eliminate the problem of uniformity of videos, the recordings were made under various conditions. Video options include weather, pedestrians, lighting, cyclists, and construction sites.

Diversity in the data increases the generalization ability of the models that are trained on it.

ImageNet-A

A dataset of images that the neural network cannot classify correctly.

Based on the test results, the models predicted objects from the dataset with an accuracy of 3%.

Contains 7.5 thousand images, the peculiarity of which is that they contain natural optical illusions.

Designed to study the stability of neural networks to ambiguous images of objects, which will help to increase the generalizing ability of models.

Ready-made solutions

Ready-made solutions are open-source repositories and software tools that are built to solve particular, often specialized tasks. 

By using those solutions you can “outsource” your model building or image processing pipeline to a tool that does it with one(ish) click or one command execution. 

With that in mind let me give you my list.

MobileNet

A set of computer vision algorithms optimized for mobile devices.

Functionality:

  • Facial analysis
  • Determination of location by environment
  • Recognition directly on the smartphone
  • Low latency and low power consumption

Fritz

A machine learning platform for iOS and Android developers.

Functionality:

  • Runs directly on mobile devices, no data transfer
  • Porting models to other frameworks and updating models in applications without having to release a new version

Computer Vision Annotation Tool

An interactive tool for marking up photos and videos.

Functionality:

  • Shapes for marking – rectangles, polygons, polylines, points
  • No need for installation
  • Ability to work together
  • Automation of the marking process
  • Support for various annotation scripts

3D-BoNet

Segmentation of objects in 3D images.

Solving the instance segmentation problem is 10 times computationally better than other existing approaches.

End-to-end neural network that accepts a 3D image as input, and gives out the boundary of recognized objects at the output.

Reasoning-RCNN

Object recognition from thousands of categories.

Detection of hard-to-see objects in the image.

An architecture that allows you to work on top of any existing detector.

STEAL

Detection of object boundaries on noisy data.

Increase the precision of marked object boundaries.

An additional layer to any semantic editor and loss function.

VQ-VAE-2

Generation of realistic versatile images.

Some fix for the disadvantages of using GAN for image generation.

Communication system of encoder and decoder on two levels.

EDVR

Recovery of frames from video.

Restore sharpness when the frame approaches and restore the content of blurry frames in video recording.

The model receives blurred frames at the input of the model, and the restored frames without blur at the output.

CorrFlow

Automatic marking of videos.

Distribution of markup from one image to the entire video.

Based on a self-supervised model.

FUNIT

Replacing objects with others.

Converting object images from one class to another with a minimum amount of training data.

Based on GAN architecture.

Information Maximizing Visual Question Generation

Generation of questions for images.

Based on the picture and the desired type of answer, a generated question is displayed.

Based on maximizing mutual information.

Algorithm for visual recognition of an object in parts

Visual recognition of an object in parts.

Identification of real world objects from parts of their images.

Based on dividing images into parts and learning how these parts fit together.

Corners for Lay-out

Layout from a photograph.

Restoring a room layout from a 360 ° photograph.

End-to-end model.

Speech2Face

Generation of an image of a person’s face from an audio recording of a voice.

Restoration of the main external characteristics of the owner of the voice.

Taking a spectrogram as input, it generates a person’s face in full face and without emotions.

The proximity of the object to the camera

The proximity of the object to the camera.

Determining how close the subject is to the camera.

Based on a comparison of complete proximity maps.

Mesh R-CNN

Modeling 3D shape of objects from an image.

3D shape prediction for objects in the input image.

End-to-end model.

DeepView

Restoration of 3D view from a couple of photos.

Recovering from a couple of output photos, the view from other angles so that the image can be viewed in 3D.

Based on a sequence of convolutional neural networks.

Increase image resolution up to 8 times

Increase image resolution up to 8 times.

More accurate face images in better quality without distortion.

Based on GAN.

DSNet

Predicting the number of people in the image.

Determination with preservation of information from different parts of the image.

End-to-end model.

PIFu

Modeling 3D human figure.

Restoring a 3D model of a dressed person from one photo.

End-to-end model.

Conclusion

The success, efficiency of execution, and quality of your projects may depend on many factors, but choosing the right tools is one of the most important – it allows you to significantly save time and resources and get the best results.

With the knowledge of machine learning tools for image processing, you can solve these kinds of problems easier, faster, and more efficiently.

That said reading about the best tools is not enough: you still need to do the work yourself. So choose the tools that are best for you and get to work!