Deep Learning Guide: Choosing Your Data Annotation Tool
We all know what data annotation is. It’s a part of any supervised deep learning project, including computer vision. A common computer vision task, like image classification, object detection, and segmentation requires annotations for each and every image fed into the model training algorithm.
You simply must get a good tool for image annotation. In this article, we’ll be checking out a few top picks that I’ve worked with throughout my career as a deep learning engineer. Even though they have the same end goal, each annotation tool is quite unique and has individual pros and cons.
To compare them, let’s define a list of criteria that will help you choose a tool that works best for you, your team, and your project.
How to choose the right data annotation tool?
The criteria for choosing the right data annotation tool are as follows:
There are a lot of images available to deep learning engineers nowadays. Annotations are manual by nature, so image labeling might eat up a big chunk of time and resources. Look for tools that make manual annotation as time-efficient as possible. Things like convenient user interface (UI), hotkey support, and other features that save our time and improve annotation quality. That’s what efficiency is about.
Labels in computer vision can differ depending on the task you’re working on. In classification, for example, we need a single label (usually an integer number) that explicitly defines a class for a given image.
Object detection is a more advanced task in computer vision. In terms of annotations, for each and every object you need a class label, and a set of coordinates for a bounding box that explicitly states where a given object is located within an image.
Semantic segmentation requires a class label and a pixel-level mask with an outline of an object.
So, depending on the problem you’re working on, you should have an annotation tool that provides all the functionality you need. As a rule of thumb, it’s great to have a tool that can annotate images for all kinds of computer vision tasks you might encounter.
Annotations come in different formats: COCO JSONs, Pascal VOC XMLs, TFRecords, text files (csv, txt), image masks, and many others. We can always convert annotations from one format to another, but having a tool that can directly output annotations in your target format is a great way to simplify your data preparation workflow, and free up a lot of time.
Are you looking for a web-based annotation app? Maybe you sometimes work offline, but still need to do annotations, and would like a window app that can be used online and offline? These might be important questions in the context of your project.
Some tools support both window apps and web-based apps. Others might be web-based only, so you won’t be able to use them outside of a web browser window. Keep that in mind when looking for your annotation tool.
If you work with sensitive data, consider privacy issues: uploading your data to a 3rd-party web app increases the risk of a data breach. Will you take the risk, or go with a safer, local annotator?
Price always matters. From my personal experience, most engineers in small / medium size teams tend to look for free tools, and that’s what we’ll focus on in this article.
For a fair comparison, we’ll take a look at paid solutions too, to figure out if they’re worth it. We’ll look at the circumstances when paid solutions make sense, and actually generate additional value.
Overview of labeling tools
You won’t see “best” or “worst” in my reviews of each annotation tool. For each of us, the “best” tool is one that meets our individual requirements and circumstances.
I will describe top 5 annotation tools, hopefully you’ll be able to choose one for yourself. These tools have proven to have good performance, and they’re well known among deep learning engineers. I’ve had a chance to work with each of these tools, and I’m happy to share my experience with you. Let’s jump in!
LabelImg is a free, open-source annotator. It has a Qt graphical interface, so you can install it and use it locally on any operating system. The interface is very simple and intuitive, so the learning curve won’t be extremely steep.
LabelImg can output annotations in multiple formats, including Pascal VOC XMLs and YOLO’s txts. It can also output CSVs and TFRecords with a few additional steps.
LabelImg supports hotkeys to improve the annotation process and make it more convenient. Users can also enjoy a LabelImg image verification feature.
LabelImg has one, but very important drawback – it only supports bounding boxes for annotations. It’s also worth mentioning that LabelImg is strictly a window-based app, with no browser support. If these limitations are ok for you, LabelImg is indeed a good candidate for an annotator for your project.
For more detailed review, guided installation and annotation process presentation, I recommend watching this tutorial created by The AI Guy.
VGG Image Annotator (VIA)
VIA is another tool for image annotations that should be on your watch list. It’s a free, open-source solution developed by a team from Oxford University.
In contrast to LabelImg, VGG Image Annotator runs entirely in a browser window. Even though it’s a web-based app, users can work offline in most web browsers. The app fits in a light-weight HTML page.
VIA has a broad range of functionality. You can draw different regional shapes around objects. Not just bounding boxes, VGG image annotator also supports circles, ellipses, polygons, points and polylines.
VIA can also annotate video frames, audio segments, and video subtitles. If you want a universal, but simple tool, VIA might be a good choice.
It has basic keyboard shortcuts that speed up the annotation process. I personally love how hotkeys work in VIA. It’s extremely convenient and well-organized.
Final annotation files can only be exported in a limited number of formats: COCO JSONs, Pascal VOC XMLs and CSVs are the supported formats. To cast annotations to other types of formats, additional external transformations will be needed, so consider that when making a decision.
To try VGG Image Annotator, check out demonstrations with preloaded data. Here are some use cases that you can browse through:
If you’re curious how an annotation process is performed in VIA, this guided tutorial by BigParticle.Cloud will give you a good overview.
Computer Vision Annotation Tool (CVAT)
CVAT’s user interface (UI) was optimized based on feedback from many professional annotation teams. Because of that, CVAT is very well designed for image and video annotation.
You can start an annotation job from CVAT’s website, and work fully online in a web-based application. CVAT’s website has some limitations, though:
- You can only upload 500 mb of data,
- Only 10 tasks per user.
Luckily, you can install it locally, and even your work offline. Installation is nicely documented, all operating systems are supported.
Supported shape forms include rectangles, polygons, polylines, points and even cuboids, tags and tracks. Compared to the previous annotators, CVAT supports annotation for semantic segmentation.
The amount of supported annotation formats for export is impressive. Here’s a complete list as of March, 2021:
- Pascal VOC (xmls)
- Segmentation masks for Pascal VOC
- YOLO (txts)
- MS COCO object detection (jsons)
- LabelMe 3.0
- WIDER Face
Teams will find CVAT especially useful, because it’s so collaborative. CVAT lets users create annotation tasks and split up the work among other users. Moreover, annotation jobs can be monitored, visualized and analyzed using elasticsearch logstash kibana. It’s always great to have a chance of control over the labeling process, visualize progress, and manage it based on monitoring results.
Shortkeys cover most common actions and help a lot in real annotation work.
Automated annotation using pre-trained models is available. Users can select a model from a model zoo, or connect a custom model.
It has some flaws. Like limited browser support for CVAT’s client. It works well only in Google Chrome. CVAT wasn’t tested and optimized for other browsers. That’s why you can get unstable operations in other web browsers, although not always. I don’t use Google Chrome, and see no significant drops in performance, just some minor bugs that don’t bother me.
To get a sense of what CVAT is and its UI, you can try an online demo on CVAT’s website or watch a video of an object annotation process by Nikita Manovich.
Visual Object Tagging Tool (VoTT)
Microsoft has come up with its own solution for data annotation – Visual Object Tagging Tool (VoTT). Free, open-source tool with a very good reputation among data scientists and machine learning engineers.
Microsoft states that “VoTT helps facilitate an end to end machine learning pipeline”. It does with three main features:
- Its ability to label images or video frames;
- An extensible model for importing data from local or cloud storage providers;
- An extensible model for exporting labeled data to local or cloud storage.
There’s both a web application and a native app. Compared to competitors, any modern web browser can run the annotator web app. It’s definitely a competitive advantage for those teams who got used to a particular browser, and don’t want to change it.
On the other hand, VoTT’s web app is not as light-weight as VIA’s. It needs a bit of time and resources to be loaded in a browser window.
Another drawback of VoTT’s web app – it can’t access the local file system. The dataset needs to be uploaded to a cloud, which can be inconvenient.
Visual Object Tagging Tool will ask you to specify two connections: for import (a source connection), and for export (a target connection). Projects in VoTT are designed as a labelling workflow setup, and require a source and target connections to be defined. You can analyze the way VoTT treats and organises labeling jobs in the official docs. The overall structure is very well designed and organized.
Annotation shapes in VoTT are limited to only two types: polygons and rectangles. However, the library of supported formats for export is quite rich. It includes:
- Generic JSONs;
- Pascal VOC;
- Microsoft Cognitive Toolkit (CNTK);
- Azure Custom Vision Service.
There are several keyboard shortcuts that let users always keep one hand on the mouse and one on a keyboard while annotating. The most common general shortcuts (copying, pasting, redoing) also have full support in VoTT.
To try Visual Object Tagging Tool, go to VoTT’s web app and give it a spin. Another great source of information about VoTT are guided tutorials. This tutorial by Intelec AI is one of my favourites. Consider watching it if you want to know more about VoTT, its UI and features.
I promised to put in some paid alternatives, and here it is. Supervisely – an end to end computer vision lifecycle platform.
Supervisely is not just an annotation tool, it’s a platform for computer vision product development. Functionally, it’s not limited to a single data annotation process. Instead, teams and independent researchers, with or without machine learning expertise, can build deep learning solutions for their needs. All of that is done in a single environment.
In terms of labeling, Supervisely lets you annotate not only images and videos, but also 3D point clouds (3D scenes built by complex sensors like LIDARs and radar sensors), and volumetric slices.
Annotation tools include conventional points, lines, rectangles and polygons. Plus, some pixel level instruments:
- Brushes to draw any shape on a scene using a mouse hold;
- Erasers that remove unwanted pixels.
Instance and semantic segmentation can be boosted up with one of the most prominent features of Supervisely. It’s called AI Assisted Labeling. You only have to define a shape of an instance, and a built-in neural net will do the rest of the job, filling up the target pixels.
Images are taken from AI Assisted Labeling web page
Annotation jobs can be managed at different scales. Depending on the team, different roles can be assigned to users. Labeling job progress is transparent and trackable.
Annotated data can be immediately used to train a neural net. You can select a model from a Model Zoo with pretrained models or go with a custom model of your choice. Either way will work.
Model Zoo is very rich with pretrained models. All models from the zoo can be added to an account and used to retrain a new custom model, so you don’t need to worry about the data format that a particular neural net requires. Supervisely does all data preparation and transformation steps for you. You’ll just have to fit in the data.
Trained models can be deployed as API. Alternatively, model weights and source code can be downloaded to use in any other scenario.
Supervisely has many other cool features, all of which I won’t be able to cover in this article, as we’re focusing on annotation tools. In case you’d like to know more about this platform, there’s an official youtube channel. I encourage you to browse through their playlists, and watch videos about topics, functionality and features that interest you. You also can look at some use cases if you wish.
In terms of pricing, students and fellow data scientists can use Supervisely at no costs. Companies and enterprises should get in touch to request pricing details. Supervisely states that their service is used by more than 25,000 companies and researchers worldwide, including big names like Mazda, Alibaba group, or Basf.
To choose a data annotator for a deep learning project, you need to be thorough: there are overwhelmingly many solutions available. Not surprisingly, each tool has different pros and cons. By now, you should have a good sense of how they differ, and what to look for depending on your needs.
We’ve gone over five candidates for consideration, looking at them from five different perspectives: efficiency, functionality, annotation formatting, application type and, of course, pricing.
LabelImg, our first candidate, is a simple and light weighted annotator. It’s extremely intuitive. If you don’t need unnecessary complexity, and solve object detection tasks with labeling, might be interested in using LabelImg. It will do exactly what you need.
VIA covers some of the drawbacks of LabelImg. You can use a web app, there’s a broader range of shapes for labeling; not just rectangles but also circles, ellipses, polygons, points and polylines.
CVAT, in contrast, supports semantic segmentation. Its collaborative functionality will serve as a good basis for effective team work.
VoTT is the only web-based annotator, optimized to work with every modern web browser. It’s backed by Microsoft, and simply can’t be a bad product.
Supervisely is the only paid candidate we’ve considered. Experienced deep learning engineers will definitely benefit from the automation and rich functionality of Supervisely. The less experienced will enjoy how it simplifies the machine learning workflow.
Find and select the tool that fits your requirements. I hope this article will help you make a good choice.