This article explores theOCR (Optical Character Recognition), technology ofextracting data from images. Tools such as Google Cloud Vision, Amazon Textract, and Tesseract are compared. Cloud solutions dominate; Tesseract, although free, may be less efficient for handwritten text or low quality documents.

What is OCR?

Optical Character Recognition, or OCR, or Optical Character Recognition, technologies are methods for extracting data from unstructured image-type documents. They are part of the large family of Computer Vision algorithms and are present all around us on a daily basis. For example, OCR will make it possible to visually identify credit cards, bank checks, invoices or expense reports.


OCR tools have become very efficient and offer time and quality savings. In fact, their use is increasing in all economic sectors.

In this article, we compare 3 OCR tools that we were able to test during missions at Aqsone and which are:

State of the art

Optical Character Recognition is not a recent subject and the first applications emerged at the beginning of the 20th century. For example, OCR use cases were aimed at detecting walrus, Braille, or typed characters.

Over the years, and with the appearance of the first CPUs, projects on a larger scale emerged in areas such as postal services, customs services, and the army.

In 2005, Hewlett Packard and the University of Nevada released the module Tesseract Open Source OCR thus expanding the use of these technologies.

In 2013, the MNIST database was published, which contains 60,000 images of handwritten numbers in black and white. A data set widely used in Machine Learning on Computer Vision topics.

Finally, the last few years have seen the emergence of Cloud OCR models such as Google Cloud Vision or Amazon Textract.

The current state of the art is thus composed mainly of:

  • Very specific paid software to certain tasks (expense reports, mail, technical documents). It is complicated to estimate their performances and to generalize them to other tasks.
  • Open source solutions, like Tesseract, a solution that is still quite popular and offers the advantage of being completely free for relatively good performance.
  • Cloud OCR models with the two main actors who are Google Cloud Vision and Amazon Textract. They offer:
    • better performances on character and language recognition,
    • a variety of complex models,
    • the possibility of training these models on a very large volume of data,
    • the possibility of being coupled with other interesting Natural Language Processing (NLP) bricks that are specific to GCP and AWS environments.

Unless they want a 100% free solution, cloud providers are probably the leaders in the current market.

Comparing OCR solutions

The table below summarizes the comparison of the various OCR solutions. The criteria we put forward are based on the business use cases we have encountered, and have generally been decisive for the selection of one OCR solution over another.

Examples of use:

Conclusion

In conclusion, Amazon Textract and Google Cloud Vision are two solutions that offer very similar possibilities and performances. Their prices are also very similar since it will take $1.5 for 1000 units to use the basic OCR feature, at the delta of discounts from a large number of units.

Google Cloud Vision will be particularly easy to use for GCP users and also has the advantage of being able to integrate with other Google Cloud services. However, its configuration can be complex for those new to the platform. Amazon Textract provides a Drag & Drop interface that makes it even easier to use.

In terms of performance, GCV seems to be more efficient than Textract for the detection of handwritten text. On the other hand, GCV is fishing on table extraction since the text is detected normally and not in table form.

Tesseract has the advantage of being free and of not requiring any special configuration. It is easy to use, on the command line or via the pytesseract Python library.

In terms of performance, Tesseract is effective on high resolution data. Its performance is reduced on low quality data or handwritten text. Table extraction is possible, but can be complex to implement and will also be very sensitive to the quality of the documents.

It is a solution that remains interesting because of its accessibility and its effectiveness on certain specific data formats.

Latest blog posts

Discover our articles on the latest trends, advances, or applications of AI today.

Caroline
Data Scientist
Aqsone
Squad Com'
Technical

Introduction to Retrieval Augmented Generation (RAG)

Learn more
Louis
Data Scientist
Aqsone
Squad Com'
Technical

Interpretability of LLMs: The Role of Sparse Autoencoders

Learn more
Diane
Business Developer
Aqsone
Squad Com'
Innovation

Artificial Intelligence in Industrial Procurement

Learn more