Optical Character Recognition(OCR)

4 min readFeb 4, 2024

After getting a smartphone, whatnot is introduced to make it smarter and it will be continued further every day and every second, keeping in mind that we have to make it the best and make it better than the previous versions. The word OCR came into the picture in the late 1960s, and the first system of OCR was introduced named “GISMO” in 1951. The machine was used to convert printed messages into machine language. Now, it’s not limited to converting only printed messages, it can help us convert images, documents, and text. In fact, most of us use it very often to copy during our internal, remember that use of smartphones under the table to search the answer during surprise tests? Well, it’s common but what is not common about it is the ability of the smartphone to search it, let the text provide very slight visibility the words will appear very clearly in the search box or in the clipboard, anyone who has used the Google Lens, can relate that.

Now, I’ve wasted my energy on storytelling let’s get back to what exactly you want now, what is this OCR, and why is it so important for us to understand its usage? So, OCR is a Process that converts an image, document, or printed message of text into machine-readable text format. The basic process of OCR involves examining the text of a document and translating the characters into code that can be used for data processing. OCR is sometimes also referred to as text recognition.

How OCR works?

The OCR mainly goes through 4 steps to process our input into the machine-readable code. The 4 steps are:

Image Acquisition: Heavy term, I know yet to simply state without going deep it captures the images and helps in converting the light signals in the device to electrical signals, the electrical signals generated by the sensor are then converted into a digital format. in simple terms, when you use your camera to capture the image to search for the desired text, the amount of energy you spend in getting the exact focus on your Visual Data is called Image Acquisition. Then, the OCR system software analyzes the scanned image and classifies the light areas as background and the dark areas as text.
Pre-Processing: The OCR system first cleans and removes errors to prepare it for reading. It uses various cleaning techniques like adjusting and fixing the alignment issues during scanning(DESKEWING), removing any digital image spots (or) smoothing the curves of the text images (DESPECKING), cleaning up boxes and lines in the image, script recognitions for multi-language OCR techniques. This step is very crucial as it helps in improving the quality of the input image and increasing the accuracy of the text recognition.
Text Recognition: The 3rd step in OCR, helps us convert the textual data into machine-readable language, and for doing that it contains two types of OCR algo or software processes, they are: 0) Pattern Matching- Pattern Matching helps in isolating the characteristic image called “GLYPH” and comparing it with a similarly stored “GLYPH”. It works only if the stored glyph has a similar font and scale to the input glyph. (NOTE: This method works well with scanned images of the documents that have been typed in known fonts). 1) Feature Extraction: This method, breaks down or decomposes the “GLYPH” into features such as lines, number of angled lines, crossed lines, or curves in a character for comparison. For example, the capital letter “A” may be stored as two diagonal lines that meet with a horizontal line across the middle. It then finds the best match or the nearest neighbor among its various stored “GLYPH”.
Post-Processing: After analysis, the system converts the extracted data into a computerized file. Some OCR systems can create annotated PDF files that include both the before and after versions of the scanned document.

TYPES OF OCR

OCR technology encompasses various types, each serving specific purposes. Here are some commonly recognized classifications:

0) Simple OCR System: This type works by storing many different font and text image patterns as Templates. Mostly, using Pattern-Matching algorithms to compare text images, character by character to its internal database. If the system matches text word by word it’s called- Optical Word Recognition. Limitations: There are virtually unlimited fonts and handwriting styles and every type can’t be in a DATABASE.

1)Intelligent Character Recognition Software: Modern OCR uses Intelligent Character Recognition(ICR) technology to read the text in the same way humans do, which uses advanced methods that train the machine to behave like humans by machine learning software. An ML system called Neural Network Analysis analyzes the text over many levels processing the images repeatedly. It looks for different image attributes like curves, line intersections, and loops then combines the results of all these different levels of analysis to get the FINAL RESULT.

2)Intelligent Word Recognition: It works on the same principle as ICR, yet processes word images instead of pre-processing the image into characters.

3) Optical Mark Recognition: Identifies logos, watermarks, and other symbols. Mostly used in providing AI solutions like Scanning and reading number plates and road signs in Self-Driving Cars, and detecting brand logos in Social Media which helps make better marketing and operational decisions that reduce expenses and improve customer experiences.

So, Sunday was somewhat productive. Well, I’m glad to complete the Blog for you peeps. I hope you got some insightful information from the Blog.

So, it’s CARL signing off. See you soon on 10th Feb. Arigato:)

Optical Character Recognition(OCR)

How OCR works?

TYPES OF OCR

Written by Carl Writes