Neptune Blog

15 Computer Visions Projects You Can Do Right Now

Harshil Patel

11 min

8th December, 2023

Computer Vision

Computer vision deals with how computers extract meaningful information from images or videos. It has a wide range of applications, including reverse engineering, security inspections, image editing and processing, computer animation, autonomous navigation, and robotics.

In this article, we’re going to explore 15 great OpenCV projects, from beginner-level to expert-level. For each project, you’ll see the essential guides, source codes, and datasets, so you can get straight to work on them if you want.

What is Computer Vision?

Computer vision is about helping machines interpret images and videos. It’s the science of interacting with an object through a digital medium and using sensors to analyze and understand what it sees. It’s a broad discipline that’s useful for machine translation, pattern recognition, robotic positioning, 3D reconstruction, driverless cars, and much more.

The field of computer vision keeps evolving and becoming more impactful thanks to constant technological innovations. As time goes by, it will offer increasingly powerful tools for researchers, businesses, and eventually consumers.

Computer Vision today

Computer vision has become a relatively standard technology in recent years due to the advancement of AI. Many companies use it for product development, sales operations, marketing campaigns, access control, security, and more.

Computer vision has plenty of applications in healthcare (including pathology), industrial automation, military use, cybersecurity, automotive engineering, drone navigation—the list goes on.

How does Computer Vision work?

Machine learning finds patterns by learning from its mistakes. The training data makes a model, which guesses and predicts things. Real-world images are broken down into simple patterns. The computer recognizes patterns in images using a neural network built with many layers.

The first layer takes pixel value and tries to identify the edges. The next few layers will try to detect simple shapes with the help of edges. In the end, all of it is put together to understand the image.

Computer vision how it works — *Source: Author*

It can take thousands, sometimes millions of images, to train a computer vision application. Sometimes even that’s not enough—some facial recognition applications can’t detect people of different skin colors because they’re trained on white people. Sometimes the application might not be able to find the difference between a dog and a bagel. Ultimately, the algorithm will only ever be as good as the data that was used for training it.

OK, enough introduction! Let’s get into the projects.

Beginner level Computer Vision projects

If you’re new or learning computer vision, these projects will help you learn a lot.

1. Edge & Contour Detection

If you’re new to computer vision, this project is a great start. CV applications detect edges first and then collect other information. There are many edge detection algorithms, and the most popular is the Canny edge detector because it’s pretty effective compared to others. It’s also a complex edge-detection technique. Below are the steps for Canny edge detection:

Reduce noise and smoothen image,
Calculate the gradient,
Non-maximum suppression,
Double the threshold,
Linking and edge detecting – hysteresis.

Code for Canny edge detection:

import cv2
import matplotlib.pyplot as plt
# Open the image
img = cv2.imread('dancing-spider.jpg')
# Apply Canny
edges = cv2.Canny(img, 100, 200, 3, L2gradient=True)
plt.figure()
plt.title('Spider')
plt.imsave('dancing-spider-canny.png', edges, cmap='gray', format='png')
plt.imshow(edges, cmap='gray')
plt.show()

Contours are lines joining all the continuous objects or points (along the boundary), having the same color or intensity. For example, it detects the shape of a leaf based on its parameters or border. Contours are an important tool for shape and object detection. The contours of an object are the boundary lines that make up the shape of an object as it is. Contours are also called outline, edges, or structure, for a very good reason: they’re a way to mark changes in depth.

Contour detection - computer vision — *Source*

Code to find contours:

import cv2
import numpy as np

# Let's load a simple image with 3 black squares
image = cv2.imread('C://Users//gfg//shapes.jpg')
cv2.waitKey(0)

# Grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Find Canny edges
edged = cv2.Canny(gray, 30, 200)
cv2.waitKey(0)

# Finding Contours
# Use a copy of the image e.g. edged.copy()
# since findContours alters the image
contours, hierarchy = cv2.findContours(edged,
    cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

cv2.imshow('Canny Edges After Contouring', edged)
cv2.waitKey(0)

print("Number of Contours found = " + str(len(contours)))

# Draw all contours
# -1 signifies drawing all contours
cv2.drawContours(image, contours, -1, (0, 255, 0), 3)

cv2.imshow('Contours', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Recommended reading & source code:

2. Colour Detection & Invisibility Cloak

This project is about detecting color in images. You can use it to edit and recognize colors from images or videos. The most popular project that uses the color detection technique is the invisibility cloak. In movies, invisibility works by doing tasks on a green screen, but here we’ll be doing it by removing the foreground layer. The invisibility cloak process is this:

Capture and store the background frame (just the background),
Detect colors,
Generate a mask,
Generate the final output to create the invisible effect.

It works on HSV (Hue Saturation Value). HSV is one of the three ways that Lightroom lets us change color ranges in photographs. It’s particularly useful for introducing or removing certain colors from an image or scene, such as changing night-time shots to day-time shots (or vice versa). It’s the color portion, identified from 0 to 360. Reducing this component toward zero introduces more grey and produces a faded effect.

Value (brightness) works in conjunction with saturation. It describes the brightness or intensity of the color, from 0–100%. So 0 is completely black, and 100 is the brightest and reveals the most color.

Recommended reading & source code:

Github Repo – https://github.com/its-harshil/invisible_cloak
Invisibility Cloak using OpenCV – Guide

3. Text Recognition using OpenCV and Tesseract (OCR)

Here, you use OpenCV and OCR (Optical Character Recognition) on your image to identify each letter and convert them into text. It’s perfect for anyone looking to take information from an image or video and turn it into text-based data. Many apps use OCR, like Google Lens, PDF Scanner, and more.

Ways to detect text from images:

Use OpenCV – popular,
Use Deep Learning models – the newest method,
Use your custom model.

Text recognition - computer vision — *Source*

Text Classification: All Tips and Tricks from 5 Kaggle Competitions

Text Detection using OpenCV

Sample code after processing the image and contour detection:

# text detection
def contours_text(orig, img, contours):
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
# Drawing a rectangle on copied image 
rect = cv2.rectangle(orig, (x, y), (x + w, y + h), (0, 255, 255), 2)
cv2.imshow('cnt',rect)
cv2.waitKey()
# Cropping the text block for giving input to OCR 
cropped = orig[y:y + h, x:x + w]
# Apply OCR on the cropped image 
config = ('-l eng --oem 1 --psm 3')
text = pytesseract.image_to_string(cropped, config=config)
print(text)

Text Detection with Tesseract

It’s an open-source application that can recognize text in 100+ languages, and it’s backed by Google. You can also train this application to recognize many other languages.

Code to detect text using tesseract:

# text recognition
import cv2
import pytesseract
# read image
im = cv2.imread('./testimg.jpg')
# configurations
config = ('-l eng --oem 1 --psm 3')
# pytesseract
text = pytesseract.image_to_string(im, config=config)
# print text
text = text.split('n')
text

Recommended reading & datasets:

4. Face Recognition with Python and OpenCV

It’s been just over a decade since the American television show CSI: Crime Scene Investigation first aired. During that time, facial recognition software has become increasingly sophisticated. Present-day software isn’t limited by superficial features like skin or hair color—instead, it identifies faces based on facial features that are more stable through changes in appearance, like eye shape and distance between eyes. This type of facial recognition is called “template matching”. You can use OpenCV, Deep learning, or a custom database to create facial recognition systems/applications.

Process of detecting a face from an image:

Find face locations and encodings,
Extract features using face embedding,
Face recognition, compare those faces.

Face recognition - computer vision — *Image source: The Times, Face Recognition: Author*

Check also

How to Choose a Loss Function for Face Recognition
Create a Face Recognition Application Using Swift, Core ML, and TuriCreate

Below is the full code for recognizing faces from images:

import cv2
import face_recognition

imgmain = face_recognition.load_image_file('ImageBasics/Bryan_Cranst.jpg')
imgmain = cv2.cvtColor(imgmain, cv2.COLOR_BGR2RGB)
imgTest = face_recognition.load_image_file('ImageBasics/bryan-cranston-el-camino-aaron-paul-1a.jpg')
imgTest = cv2.cvtColor(imgTest, cv2.COLOR_BGR2RGB)

faceLoc = face_recognition.face_locations(imgmain)[0]
encodeElon = face_recognition.face_encodings(imgmain)[0]
cv2.rectangle(imgmain, (faceLoc[3], faceLoc[0]), (faceLoc[1], faceLoc[2]), (255, 0, 255), 2)

faceLocTest = face_recognition.face_locations(imgTest)[0]
encodeTest = face_recognition.face_encodings(imgTest)[0]
cv2.rectangle(imgTest, (faceLocTest[3], faceLocTest[0]), (faceLocTest[1], faceLocTest[2]), (255, 0, 255), 2)

results = face_recognition.compare_faces([encodeElon], encodeTest)
faceDis = face_recognition.face_distance([encodeElon], encodeTest)
print(results, faceDis)
cv2.putText(imgTest, f'{results} {round(faceDis[0], 2)}', (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255), 2)

cv2.imshow('Main Image', imgmain)
cv2.imshow('Test Image', imgTest)
cv2.waitKey(0)

Code to recognize faces from webcam or live camera:

cv2.imshow("Frame", frame)
    if cv2.waitKey(1) &amp; 0xFF == ord('q'):
        break
video_capture.release()
cv2.destroyAllWindows()

Recommended reading & datasets:

5. Object Detection

Object detection is the automatic inference of what an object is in a given image or video frame. It’s used in self-driving cars, tracking, face detection, pose detection, and a lot more. There are 3 major types of object detection – using OpenCV, a machine learning-based approach, and a deep learning-based approach.

Object detection - computer vision — *Source*

May interest you

️ How to Train Your Own Object Detector Using TensorFlow Object Detection API
️ TensorFlow Object Detection API: Best Practices to Training, Evaluation & Deployment

Below is the full code to detect objects:

import cv2
# Enable camera
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 420)
# import cascade file for facial recognition
faceCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
'''
# if you want to detect any object for example eyes, use one more layer of classifier as below:
eyeCascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_eye_tree_eyeglasses.xml")
'''
while True:
success, img = cap.read()
imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Getting corners around the face
faces = faceCascade.detectMultiScale(imgGray, 1.3, 5)  # 1.3 = scale factor, 5 = minimum neighbor
# drawing bounding box around face
for (x, y, w, h) in faces:
img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
'''
# detecting eyes
eyes = eyeCascade.detectMultiScale(imgGray)
# drawing bounding box for eyes
for (ex, ey, ew, eh) in eyes:
img = cv2.rectangle(img, (ex, ey), (ex+ew, ey+eh), (255, 0, 0), 3)
'''
cv2.imshow('face_detect', img)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyWindow('face_detect')

Recommended reading & datasets:

Intermediate level Computer Vision projects

We’re taking things to the next level with a few intermediate-level projects. These projects will probably be more fun than beginner projects, but also more challenging.

6. Hand Gesture Recognition

In this project, you need to detect hand gestures. After detecting the gesture, we’ll assign commands to them. You can even play games with multiple commands using hand gesture recognition.

How gesture recognition works:

Install the Pyautogui library – it helps to control the mouse and keyboard without any user interaction,
Convert it into HSV,
Find contours,
Assign command at any value – below we used 5 (from hand) to jump.

Source: Author

Full code to play the dino game with hand gestures:

#Find contours
#image, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
try:
# Find contour with maximum area
contour = max(contours, key=lambda x: cv2.contourArea(x))
# Create bounding rectangle around the contour
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(crop_image, (x, y), (x + w, y + h), (0, 0, 255), 0)
# Find convex hull
hull = cv2.convexHull(contour)
# Draw contour
drawing = np.zeros(crop_image.shape, np.uint8)
cv2.drawContours(drawing, [contour], -1, (0, 255, 0), 0)
cv2.drawContours(drawing, [hull], -1, (0, 0, 255), 0)
# Fi convexity defects
hull = cv2.convexHull(contour, returnPoints=False)
defects = cv2.convexityDefects(contour, hull)
# Use cosine rule to find angle of the far point from the start and end point i.e. the convex points (the finger
# tips) for all defects
count_defects = 0
for i in range(defects.shape[0]):
s, e, f, d = defects[i, 0]
start = tuple(contour[s][0])
end = tuple(contour[e][0])
far = tuple(contour[f][0])
a = math.sqrt((end[0] - start[0]) ** 2 + (end[1] - start[1]) ** 2)
b = math.sqrt((far[0] - start[0]) ** 2 + (far[1] - start[1]) ** 2)
c = math.sqrt((end[0] - far[0]) ** 2 + (end[1] - far[1]) ** 2)
angle = (math.acos((b ** 2 + c ** 2 - a ** 2) / (2 * b * c)) * 180) / 3.14
# if angle >= 90 draw a circle at the far point
if angle <= 90:
count_defects += 1
cv2.circle(crop_image, far, 1, [0, 0, 255], -1)
cv2.line(crop_image, start, end, [0, 255, 0], 2)
# Press SPACE if condition is match
if count_defects >= 4:
pyautogui.press('space')
cv2.putText(frame, "JUMP", (115, 80), cv2.FONT_HERSHEY_SIMPLEX, 2, 2, 2)

Recommended reading & source code:

7. Human Pose Detection

Many applications use human pose detection to see how a player plays in a specific game (for example – baseball). The ultimate goal is to locate landmarks in the body. Human pose detection is used in many real-life videos and image-based applications, including physical exercise, sign language detection, dance, yoga, and much more.

Pose detection - computer vision — *Source*

Recommended reading & datasets:

Pose detection 2 - computer vision — *Source*

8. Road Lane Detection in Autonomous Vehicles

If you want to get into self-driving cars, this project will be a good start. You’ll detect lanes, edges of the road, and a lot more. Lane detection works like this:

Apply the mask,
Do image thresholding (thresholding converts an image to grayscale by replacing each pixel >= specified gray level with the corresponding gray level),
Do hough line transformation (detecting lane lines).

Road detection - computer vision — *Source*

Road lane detection - computer vision — *Source: Author*

Recommended reading & datasets:

9. Pathology Classification

Computer vision is emerging in healthcare. The amount of data that pathologists analyze in a day can be too much to handle. Luckily, deep learning algorithms can identify patterns in large amounts of data that humans wouldn’t notice otherwise. As more images are entered and categorized into groups, the accuracy of these algorithms becomes better and better over time.

It can detect various diseases in plants, animals, and humans. For this application, the goal is to get datasets from Kaggle OCT and classify data into different sections. The dataset has around 85000 images. Optical coherence tomography (OCT) is an emerging medical technology for performing high-resolution cross-sectional imaging. Optical coherence tomography uses light waves to look inside a living human body. It can be used to evaluate thinning skin, broken blood vessels, heart diseases, and many other medical problems.

Over time, it’s gained the trust of doctors around the globe as a quick and effective way of diagnosing more quality patients than traditional methods. It can also be used to examine tattoo pigments or assess different layers of a skin graft that’s placed on a burn patient.

Pathology classification - computer vision — *Source: Kaggle Dataset*

Code for Gradcam library used for classification:

from tf_explain.callbacks.occlusion_sensitivity import OcclusionSensitivityCallback
import datetime
%load_ext tensorboard
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
o_callbacks = [OcclusionSensitivityCallback(validation_data=(vis_test, vis_lab),class_index=2,patch_size=4),]
model_TF.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss='binary_crossentropy', metrics=[fbeta])
model_TF.fit(vis_test, vis_lab, epochs=10, verbose=1, callbacks=[tensorboard_callback, o_callbacks])

Recommended reading & datasets:

10. Fashion MNIST for Image Classification

One of the most used MNIST datasets was a database of handwritten images, which contains around 60,000 train and 10,000 test images of handwritten digits from 0 to 9. Inspired by this, they created Fashion MNIST, which classifies clothes. As a result of the large database and all the resources provided by MNIST, you get a high accuracy range from 96-99%.

This is a complex dataset containing 60,000 training images of clothes (35 categories) from online shops like ASOS or H&M. These images are divided into two subsets, one with clothes similar to the fashion industry, and the other with clothes belonging to the general public. The dataset contains 1.2 million samples (clothes and prices) for each category.

Fashion mnist - computer vision — *Source*

Recommended reading & datasets:

Advanced level Computer Vision projects

Once you’re an expert in computer vision, you can develop projects from your own ideas. Below are a few advanced-level fun projects you can work with if you have enough skills and knowledge.

11. Image Deblurring using Generative Adversarial Networks

Image deblurring is an interesting technology with plenty of applications. Here, a generative adversarial network (GAN) automatically trains a generative model, like Image DeBlur’s AI algorithm. Before looking into this project, let’s understand what GANs are and how they work.

Generative Adversarial Networks is a new deep-learning approach that has shown unprecedented success in various computer vision tasks, such as image super-resolution. However, it remains an open problem how best to train these networks. A Generative Adversarial Network can be thought of as two networks competing with one another; just like humans compete against each other on game shows like Jeopardy or Survivor. Both parties have tasks and need to come up with strategies based on their opponent’s appearance or moves throughout the game, while also trying not to be eliminated first. There are 3 major steps involved in training for deblurring:

Create fake inputs based on noise using the generator,
Train it with both real and fake sets,
Train the whole model.

Image deblurring - computer vision — *Source*

Recommended reading & datasets:

12. Image Transformation

With this project, you can transform any image into different forms. For example, you can change a real image into a graphical one. This is kind of a creative and fun project to do. When we use the standard GAN method, it becomes difficult to transform the images, but for this project, most people use Cycle GAN.

Check also

What Image Processing Techniques Are Actually Used in the ML Industry?

The idea is that you train two competing neural networks against each other. One network creates new data samples, called the “generator,” while the other network judges whether it’s real or fake. The generator alters its parameters to try to fool the judge by producing more realistic samples. In this way, both networks improve with time and continue to improve indefinitely – this makes GANs an ongoing project rather than a one-off assignment. This is a different type of GAN, it’s an extension of GAN architecture. What Cycle Gan does is create a cycle of generating the input. Let’s say you’re using Google Translate, you translate English to German, you open a new tab, copy the german output and translate German to English—the goal here is to get the original input you had. Below is an example of how transforming images to artwork works.

Image transformation - computer vision — *Source*

Recommended reading & source code:

13. Automatic Colorization of Photos using Deep Neural Networks

When it comes to coloring black and white images, machines have never been able to do an adequate job. They can’t understand the boundary between grey and white, leading to a range of monochromatic hues that seem unrealistic. To overcome this issue, scientists from UC Berkeley, along with colleagues at Microsoft Research, developed a new algorithm that automatically colorizes photographs by using deep neural networks.

Deep neural networks are a very promising technique for image classification because they can learn the composition of an image by looking at many pictures. Densely connected convolutional neural networks (CNN) have been used to classify images in this study. CNN’s are trained with large amounts of labeled data, and output a score corresponding to the associated class label for any input image. They can be thought of as feature detectors that are applied to the original input image.

Colourization is the process of adding color to a black and white photo. It can be accomplished by hand, but it’s a tedious process that takes hours or days, depending on the level of detail in the photo. Recently, there’s been an explosion in deep neural networks for image recognition tasks such as facial recognition and text detection. In simple terms, it’s the process of adding colors to grayscale images or videos. However, with the rapid advance of deep learning in recent years, a Convolutional Neural Network (CNN) can colorize black and white images by predicting what the colors should be on a per-pixel basis. This project helps to colorize old photos. As you can see in the image below, it can even properly predict the color of coca-cola, because of the large number of datasets.

Automatic colorization - computer vision — *Source*

Recommended reading & guide:

14. Vehicle Counting and Classification

Nowadays, many places are equipped with surveillance systems that combine AI with cameras, from government organizations to private facilities. These AI-based cameras help in many ways, and one of the main features is to count the number of vehicles. It can be used to count the number of vehicles passing by or entering any particular place. This project can be used in many areas like crowd counting, traffic management, vehicle number plate, sports, and many more. The process is simple:

Frame differencing,
Image thresholding,
Contour finding,
Image dilation.

And finally, vehicle counting:

Source

Recommended reading & datasets:

15. Vehicle license plate scanners

A vehicle license plate scanner in computer vision is a type of computer vision application that can be used to identify plates and read their numbers. This technology is used for a variety of purposes, including law enforcement, identifying stolen vehicles, and tracking down fugitives.

A more sophisticated vehicle license plate scanner in computer vision can scan, read and identify hundreds, even thousands of cars per minute with 99% accuracy from distances up to half a mile away in heavy traffic conditions on highways and city streets. This project is very useful in many cases.

The goal is to first detect the license plate and then scan the numbers and text written on it. It’s also referred to as an automatic number plate detection system. The process is simple:

Capture image,
Search for the number plate,
Filter image,
Line separate using row segmentation,
OCR for the numbers and characters.

Plate scanner - computer vision — *Source*

Recommended reading & datasets:

Conclusion

And that’s it! Hope you liked the computer vision projects. As a cherry on top, I’ll leave you with several extra projects that you might also be interested in.

Extra projects

Photo Sketching
Collage Mosaic Generator
Blur the Face
Image Segmentation
Sudoku Solver
Object Tracking
Watermarking Images
Image Reverse Search Engine

Additional research and recommended reading

Was the article useful?

More about 15 Computer Visions Projects You Can Do Right Now

Check out our product resources and related articles below:

We are joining OpenAI

Synthetic Data for LLM Training

What are LLM Embeddings: All you Need to Know

Detecting and Fixing ‘Dead Neurons’ in Foundation Models

Explore more content topics:

Computer Vision General LLMOps ML Model Development ML Tools MLOps Natural Language Processing Paper Reflections Reinforcement Learning Tabular Data Time Series

Neptune is the experiment tracker purpose-built for foundation model training.

It lets you monitor and visualize thousands of per-layer metrics—losses, gradients, and activations—at any scale. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.

Play with a live project

See Docs

Read also

What is Computer Vision?

Computer Vision today

How does Computer Vision work?

Beginner level Computer Vision projects

1. Edge & Contour Detection

2. Colour Detection & Invisibility Cloak

3. Text Recognition using OpenCV and Tesseract (OCR)

Related

4. Face Recognition with Python and OpenCV

Check also

5. Object Detection

May interest you

Intermediate level Computer Vision projects

6. Hand Gesture Recognition

7. Human Pose Detection

8. Road Lane Detection in Autonomous Vehicles

9. Pathology Classification

10. Fashion MNIST for Image Classification

Advanced level Computer Vision projects

11. Image Deblurring using Generative Adversarial Networks

Read also

12. Image Transformation

Check also

13. Automatic Colorization of Photos using Deep Neural Networks

14. Vehicle Counting and Classification

15. Vehicle license plate scanners

Conclusion

Extra projects

Additional research and recommended reading

Was the article useful?

Check out our product resources and related articles below:

We are joining OpenAI

Synthetic Data for LLM Training

What are LLM Embeddings: All you Need to Know

Detecting and Fixing ‘Dead Neurons’ in Foundation Models

Explore more content topics: