MLOps Blog

15 Computer Visions Projects You Can Do Right Now

11 min
11th August, 2023

Computer vision deals with how computers extract meaningful information from images or videos. It has a wide range of applications, including reverse engineering, security inspections, image editing and processing, computer animation, autonomous navigation, and robotics. 

In this article, we’re going to explore 15 great OpenCV projects, from beginner-level to expert-level. For each project, you’ll see the essential guides, source codes, and datasets, so you can get straight to work on them if you want.

Read also

Top Tools to Run a Computer Vision Project

What is Computer Vision?

Computer vision is about helping machines interpret images and videos. It’s the science of interacting with an object through a digital medium and using sensors to analyze and understand what it sees. It’s a broad discipline that’s useful for machine translation, pattern recognition, robotic positioning, 3D reconstruction, driverless cars, and much more.

The field of computer vision keeps evolving and becoming more impactful thanks to constant technological innovations. As time goes by, it will offer increasingly powerful tools for researchers, businesses, and eventually consumers.

Computer Vision today

Computer vision has become a relatively standard technology in recent years due to the advancement of AI. Many companies use it for product development, sales operations, marketing campaigns, access control, security, and more. 

Computer vision today
Source: Author 

Computer vision has plenty of applications in healthcare (including pathology), industrial automation, military use, cybersecurity, automotive engineering, drone navigation—the list goes on.

How does Computer Vision work?

Machine learning finds patterns by learning from its mistakes. The training data makes a model, which guesses and predicts things. Real-world images are broken down into simple patterns. The computer recognizes patterns in images using a neural network built with many layers.  

The first layer takes pixel value and tries to identify the edges. The next few layers will try to detect simple shapes with the help of edges. In the end, all of it is put together to understand the image.

Computer vision how it works
Source: Author 

It can take thousands, sometimes millions of images, to train a computer vision application. Sometimes even that’s not enough—some facial recognition applications can’t detect people of different skin colors because they’re trained on white people. Sometimes the application might not be able to find the difference between a dog and a bagel. Ultimately, the algorithm will only ever be as good as the data that was used for training it. 

OK, enough introduction! Let’s get into the projects.

Computer Vision projects for all experience levels

Beginner level Computer Vision projects 

If you’re new or learning computer vision, these projects will help you learn a lot.

1. Edge & Contour Detection 

If you’re new to computer vision, this project is a great start. CV applications detect edges first and then collect other information. There are many edge detection algorithms, and the most popular is the Canny edge detector because it’s pretty effective compared to others. It’s also a complex edge-detection technique. Below are the steps for Canny edge detection:

  1. Reduce noise and smoothen image,
  2. Calculate the gradient,
  3. Non-maximum suppression,
  4. Double the threshold,
  5. Linking and edge detecting – hysteresis.

Code for Canny edge detection:

import cv2
import matplotlib.pyplot as plt
# Open the image
img = cv2.imread('dancing-spider.jpg')
# Apply Canny
edges = cv2.Canny(img, 100, 200, 3, L2gradient=True)
plt.imsave('dancing-spider-canny.png', edges, cmap='gray', format='png')
plt.imshow(edges, cmap='gray')

Contours are lines joining all the continuous objects or points (along the boundary), having the same color or intensity. For example, it detects the shape of a leaf based on its parameters or border. Contours are an important tool for shape and object detection. The contours of an object are the boundary lines that make up the shape of an object as it is. Contours are also called outline, edges, or structure, for a very good reason: they’re a way to mark changes in depth.

Code to find contours:

import cv2
import numpy as np

# Let's load a simple image with 3 black squares
image = cv2.imread('C://Users//gfg//shapes.jpg')

# Grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Find Canny edges
edged = cv2.Canny(gray, 30, 200)

# Finding Contours
# Use a copy of the image e.g. edged.copy()
# since findContours alters the image
contours, hierarchy = cv2.findContours(edged,

cv2.imshow('Canny Edges After Contouring', edged)

print("Number of Contours found = " + str(len(contours)))

# Draw all contours
# -1 signifies drawing all contours
cv2.drawContours(image, contours, -1, (0, 255, 0), 3)

cv2.imshow('Contours', image)

Recommended reading & source code: 

2. Colour Detection & Invisibility Cloak

This project is about detecting color in images. You can use it to edit and recognize colors from images or videos. The most popular project that uses the color detection technique is the invisibility cloak. In movies, invisibility works by doing tasks on a green screen, but here we’ll be doing it by removing the foreground layer. The invisibility cloak process is this:

  1. Capture and store the background frame (just the background),
  2. Detect colors,
  3. Generate a mask,
  4. Generate the final output to create the invisible effect. 
Invisibility cloak - computer vision

It works on HSV (Hue Saturation Value). HSV is one of the three ways that Lightroom lets us change color ranges in photographs. It’s particularly useful for introducing or removing certain colors from an image or scene, such as changing night-time shots to day-time shots (or vice versa). It’s the color portion, identified from 0 to 360. Reducing this component toward zero introduces more grey and produces a faded effect. 

Value (brightness) works in conjunction with saturation. It describes the brightness or intensity of the color, from 0–100%. So 0 is completely black, and 100 is the brightest and reveals the most color.

Recommended reading & source code: 

3. Text Recognition using OpenCV and Tesseract (OCR)

Here, you use OpenCV and OCR (Optical Character Recognition) on your image to identify each letter and convert them into text. It’s perfect for anyone looking to take information from an image or video and turn it into text-based data. Many apps use OCR, like Google Lens, PDF Scanner, and more.

Ways to detect text from images:

  • Use OpenCV – popular,
  • Use Deep Learning models – the newest method,
  • Use your custom model.

Text Classification: All Tips and Tricks from 5 Kaggle Competitions

Text Detection using OpenCV

Sample code after processing the image and contour detection:

# text detection
def contours_text(orig, img, contours):
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
# Drawing a rectangle on copied image 
rect = cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 255, 255), 2)
# Cropping the text block for giving input to OCR 
cropped = orig[y:y + h, x:x + w]
# Apply OCR on the cropped image 
config = ('-l eng --oem 1 --psm 3')
text = pytesseract.image_to_string(cropped, config=config)

Text Detection with Tesseract

It’s an open-source application that can recognize text in 100+ languages, and it’s backed by Google. You can also train this application to recognize many other languages. 

Code to detect text using tesseract: 

# text recognition
import cv2
import pytesseract
# read image
im = cv2.imread('./testimg.jpg')
# configurations
config = ('-l eng --oem 1 --psm 3')
# pytesseract
text = pytesseract.image_to_string(im, config=config)
# print text
text = text.split('n')

Recommended reading & datasets: 

4. Face Recognition with Python and OpenCV

It’s been just over a decade since the American television show CSI: Crime Scene Investigation first aired. During that time, facial recognition software has become increasingly sophisticated. Present-day software isn’t limited by superficial features like skin or hair color—instead, it identifies faces based on facial features that are more stable through changes in appearance, like eye shape and distance between eyes. This type of facial recognition is called “template matching”. You can use OpenCV, Deep learning, or a custom database to create facial recognition systems/applications. 

Process of detecting a face from an image:

  • Find face locations and encodings,
  • Extract features using face embedding,
  • Face recognition, compare those faces.          
Face recognition - computer vision
    Image source: The Times, Face Recognition: Author

Check also

How to Choose a Loss Function for Face Recognition
Create a Face Recognition Application Using Swift, Core ML, and TuriCreate

Below is the full code for recognizing faces from images:

import cv2
import face_recognition

imgmain = face_recognition.load_image_file('ImageBasics/Bryan_Cranst.jpg')
imgmain = cv2.cvtColor(imgmain, cv2.COLOR_BGR2RGB)
imgTest = face_recognition.load_image_file('ImageBasics/bryan-cranston-el-camino-aaron-paul-1a.jpg')
imgTest = cv2.cvtColor(imgTest, cv2.COLOR_BGR2RGB)

faceLoc = face_recognition.face_locations(imgmain)[0]
encodeElon = face_recognition.face_encodings(imgmain)[0]
cv2.rectangle(imgmain, (faceLoc[3], faceLoc[0]), (faceLoc[1], faceLoc[2]), (255, 0, 255), 2)

faceLocTest = face_recognition.face_locations(imgTest)[0]
encodeTest = face_recognition.face_encodings(imgTest)[0]
cv2.rectangle(imgTest, (faceLocTest[3], faceLocTest[0]), (faceLocTest[1], faceLocTest[2]), (255, 0, 255), 2)

results = face_recognition.compare_faces([encodeElon], encodeTest)
faceDis = face_recognition.face_distance([encodeElon], encodeTest)
print(results, faceDis)
cv2.putText(imgTest, f'{results} {round(faceDis[0], 2)}', (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255), 2)

cv2.imshow('Main Image', imgmain)
cv2.imshow('Test Image', imgTest)

Code to recognize faces from webcam or live camera:

cv2.imshow("Frame", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):

Recommended reading & datasets: 

5. Object Detection

Object detection is the automatic inference of what an object is in a given image or video frame. It’s used in self-driving cars, tracking, face detection, pose detection, and a lot more. There are 3 major types of object detection – using OpenCV, a machine learning-based approach, and a deep learning-based approach.

May interest you

How to Train Your Own Object Detector Using TensorFlow Object Detection API
TensorFlow Object Detection API: Best Practices to Training, Evaluation & Deployment

Below is the full code to detect objects:

import cv2
# Enable camera
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 420)
# import cascade file for facial recognition
faceCascade = cv2.CascadeClassifier( + "haarcascade_frontalface_default.xml")
# if you want to detect any object for example eyes, use one more layer of classifier as below:
eyeCascade = cv2.CascadeClassifier( + "haarcascade_eye_tree_eyeglasses.xml")
while True:
success, img =
imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Getting corners around the face
faces = faceCascade.detectMultiScale(imgGray, 1.3, 5)  # 1.3 = scale factor, 5 = minimum neighbor
# drawing bounding box around face
for (x, y, w, h) in faces:
img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
# detecting eyes
eyes = eyeCascade.detectMultiScale(imgGray)
# drawing bounding box for eyes
for (ex, ey, ew, eh) in eyes:
img = cv2.rectangle(img, (ex, ey), (ex+ew, ey+eh), (255, 0, 0), 3)
cv2.imshow('face_detect', img)
if cv2.waitKey(10) & 0xFF == ord('q'):

Recommended reading & datasets: 

Intermediate level Computer Vision projects 

We’re taking things to the next level with a few intermediate-level projects. These projects will probably be more fun than beginner projects, but also more challenging.

6. Hand Gesture Recognition

In this project, you need to detect hand gestures. After detecting the gesture, we’ll assign commands to them. You can even play games with multiple commands using hand gesture recognition.

How gesture recognition works:

  • Install the Pyautogui library – it helps to control the mouse and keyboard without any user interaction,
  • Convert it into HSV,
  • Find contours,
  • Assign command at any value – below we used 5 (from hand) to jump.
Source: Author 

Full code to play the dino game with hand gestures: 

#Find contours
#image, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Find contour with maximum area
contour = max(contours, key=lambda x: cv2.contourArea(x))
# Create bounding rectangle around the contour
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(crop_image, (x, y), (x + w, y + h), (0, 0, 255), 0)
# Find convex hull
hull = cv2.convexHull(contour)
# Draw contour
drawing = np.zeros(crop_image.shape, np.uint8)
cv2.drawContours(drawing, [contour], -1, (0, 255, 0), 0)
cv2.drawContours(drawing, [hull], -1, (0, 0, 255), 0)
# Fi convexity defects
hull = cv2.convexHull(contour, returnPoints=False)
defects = cv2.convexityDefects(contour, hull)
# Use cosine rule to find angle of the far point from the start and end point i.e. the convex points (the finger
# tips) for all defects
count_defects = 0
for i in range(defects.shape[0]):
s, e, f, d = defects[i, 0]
start = tuple(contour[s][0])
end = tuple(contour[e][0])
far = tuple(contour[f][0])
a = math.sqrt((end[0] - start[0]) ** 2 + (end[1] - start[1]) ** 2)
b = math.sqrt((far[0] - start[0]) ** 2 + (far[1] - start[1]) ** 2)
c = math.sqrt((end[0] - far[0]) ** 2 + (end[1] - far[1]) ** 2)
angle = (math.acos((b ** 2 + c ** 2 - a ** 2) / (2 * b * c)) * 180) / 3.14
# if angle >= 90 draw a circle at the far point
if angle <= 90:
count_defects += 1, far, 1, [0, 0, 255], -1)
cv2.line(crop_image, start, end, [0, 255, 0], 2)
# Press SPACE if condition is match
if count_defects >= 4:'space')
cv2.putText(frame, "JUMP", (115, 80), cv2.FONT_HERSHEY_SIMPLEX, 2, 2, 2)

Recommended reading & source code: 

7. Human Pose Detection

Many applications use human pose detection to see how a player plays in a specific game (for example – baseball). The ultimate goal is to locate landmarks in the body.  Human pose detection is used in many real-life videos and image-based applications, including physical exercise, sign language detection, dance, yoga, and much more. 

Recommended reading & datasets: 

8. Road Lane Detection in Autonomous Vehicles

If you want to get into self-driving cars, this project will be a good start. You’ll detect lanes, edges of the road, and a lot more. Lane detection works like this:

  • Apply the mask,
  • Do image thresholding (thresholding converts an image to grayscale by replacing each pixel >= specified gray level with the corresponding gray level),
  • Do hough line transformation (detecting lane lines).
Road lane detection - computer vision
Source: Author 

Recommended reading & datasets: 

9. Pathology Classification

Computer vision is emerging in healthcare. The amount of data that pathologists analyze in a day can be too much to handle. Luckily, deep learning algorithms can identify patterns in large amounts of data that humans wouldn’t notice otherwise. As more images are entered and categorized into groups, the accuracy of these algorithms becomes better and better over time.

It can detect various diseases in plants, animals, and humans. For this application, the goal is to get datasets from Kaggle OCT and classify data into different sections. The dataset has around 85000 images. Optical coherence tomography (OCT) is an emerging medical technology for performing high-resolution cross-sectional imaging. Optical coherence tomography uses light waves to look inside a living human body. It can be used to evaluate thinning skin, broken blood vessels, heart diseases, and many other medical problems.

Over time, it’s gained the trust of doctors around the globe as a quick and effective way of diagnosing more quality patients than traditional methods. It can also be used to examine tattoo pigments or assess different layers of a skin graft that’s placed on a burn patient.

Code for Gradcam library used for classification:

from tf_explain.callbacks.occlusion_sensitivity import OcclusionSensitivityCallback
import datetime
%load_ext tensorboard
log_dir = "logs/fit/" +"%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
o_callbacks = [OcclusionSensitivityCallback(validation_data=(vis_test, vis_lab),class_index=2,patch_size=4),]
model_TF.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss='binary_crossentropy', metrics=[fbeta]), vis_lab, epochs=10, verbose=1, callbacks=[tensorboard_callback, o_callbacks])

Recommended reading & datasets: 

10. Fashion MNIST for Image Classification

One of the most used MNIST datasets was a database of handwritten images, which contains around 60,000 train and 10,000 test images of handwritten digits from 0 to 9. Inspired by this, they created Fashion MNIST, which classifies clothes. As a result of the large database and all the resources provided by MNIST, you get a high accuracy range from 96-99%.

This is a complex dataset containing 60,000 training images of clothes (35 categories) from online shops like ASOS or H&M. These images are divided into two subsets, one with clothes similar to the fashion industry, and the other with clothes belonging to the general public. The dataset contains 1.2 million samples (clothes and prices) for each category.

Recommended reading & datasets: 

Advanced level Computer Vision projects 

Once you’re an expert in computer vision, you can develop projects from your own ideas. Below are a few advanced-level fun projects you can work with if you have enough skills and knowledge. 

11. Image Deblurring using Generative Adversarial Networks

Image deblurring is an interesting technology with plenty of applications. Here, a generative adversarial network (GAN) automatically trains a generative model, like Image DeBlur’s AI algorithm. Before looking into this project, let’s understand what GANs are and how they work.

Read also

Understanding GAN Loss Functions
6 GAN Architectures You Really Should Know

Generative Adversarial Networks is a new deep-learning approach that has shown unprecedented success in various computer vision tasks, such as image super-resolution. However, it remains an open problem how best to train these networks. A Generative Adversarial Network can be thought of as two networks competing with one another; just like humans compete against each other on game shows like Jeopardy or Survivor. Both parties have tasks and need to come up with strategies based on their opponent’s appearance or moves throughout the game, while also trying not to be eliminated first. There are 3 major steps involved in training for deblurring:

  • Create fake inputs based on noise using the generator,
  • Train it with both real and fake sets, 
  • Train the whole model.

Recommended reading & datasets: 

12. Image Transformation 

With this project, you can transform any image into different forms. For example, you can change a real image into a graphical one. This is kind of a creative and fun project to do. When we use the standard GAN method, it becomes difficult to transform the images, but for this project, most people use Cycle GAN. 

Check also

What Image Processing Techniques Are Actually Used in the ML Industry?

The idea is that you train two competing neural networks against each other. One network creates new data samples, called the “generator,” while the other network judges whether it’s real or fake. The generator alters its parameters to try to fool the judge by producing more realistic samples. In this way, both networks improve with time and continue to improve indefinitely – this makes GANs an ongoing project rather than a one-off assignment. This is a different type of GAN, it’s an extension of GAN architecture. What Cycle Gan does is create a cycle of generating the input. Let’s say you’re using Google Translate, you translate English to German, you open a new tab, copy the german output and translate German to English—the goal here is to get the original input you had. Below is an example of how transforming images to artwork works.

Recommended reading & source code: 

13. Automatic Colorization of Photos using Deep Neural Networks

When it comes to coloring black and white images, machines have never been able to do an adequate job. They can’t understand the boundary between grey and white, leading to a range of monochromatic hues that seem unrealistic. To overcome this issue, scientists from UC Berkeley, along with colleagues at Microsoft Research, developed a new algorithm that automatically colorizes photographs by using deep neural networks.

Deep neural networks are a very promising technique for image classification because they can learn the composition of an image by looking at many pictures. Densely connected convolutional neural networks (CNN) have been used to classify images in this study. CNN’s are trained with large amounts of labeled data, and output a score corresponding to the associated class label for any input image. They can be thought of as feature detectors that are applied to the original input image.

Colourization is the process of adding color to a black and white photo. It can be accomplished by hand, but it’s a tedious process that takes hours or days, depending on the level of detail in the photo. Recently, there’s been an explosion in deep neural networks for image recognition tasks such as facial recognition and text detection. In simple terms, it’s the process of adding colors to grayscale images or videos. However, with the rapid advance of deep learning in recent years, a Convolutional Neural Network (CNN) can colorize black and white images by predicting what the colors should be on a per-pixel basis. This project helps to colorize old photos. As you can see in the image below, it can even properly predict the color of coca-cola, because of the large number of datasets.

Recommended reading & guide: 

14. Vehicle Counting and Classification

Nowadays, many places are equipped with surveillance systems that combine AI with cameras, from government organizations to private facilities. These AI-based cameras help in many ways, and one of the main features is to count the number of vehicles. It can be used to count the number of vehicles passing by or entering any particular place. This project can be used in many areas like crowd counting, traffic management, vehicle number plate, sports, and many more.  The process is simple:

  • Frame differencing,
  • Image thresholding,
  • Contour finding,
  • Image dilation.

And finally, vehicle counting:


Recommended reading & datasets: 

15. Vehicle license plate scanners

A vehicle license plate scanner in computer vision is a type of computer vision application that can be used to identify plates and read their numbers. This technology is used for a variety of purposes, including law enforcement, identifying stolen vehicles, and tracking down fugitives.

A more sophisticated vehicle license plate scanner in computer vision can scan, read and identify hundreds, even thousands of cars per minute with 99% accuracy from distances up to half a mile away in heavy traffic conditions on highways and city streets. This project is very useful in many cases. 

The goal is to first detect the license plate and then scan the numbers and text written on it. It’s also referred to as an automatic number plate detection system. The process is simple:

  • Capture image,
  • Search for the number plate,
  • Filter image,
  • Line separate using row segmentation,
  • OCR for the numbers and characters.

Recommended reading & datasets: 


And that’s it! Hope you liked the computer vision projects. As a cherry on top, I’ll leave you with several extra projects that you might also be interested in.

Extra projects 

  • Photo Sketching
  • Collage Mosaic Generator
  • Blur the Face
  • Image Segmentation
  • Sudoku Solver
  • Object Tracking
  • Watermarking Images 
  • Image Reverse Search Engine

Was the article useful?

Thank you for your feedback!