International Journal of Innovative Research in                 Electrical, Electronics, Instrumentation and Control Engineering

A monthly Peer-reviewed & Refereed journal

ISSN Online 2321-2004
ISSN Print 2321-5526

Since 2013

Abstract: Optical Character Recognition (OCR) is a predominant aspect to transmute scanned images and other visuals into text. Computer vision technology is extrapolated onto the system to enhance the text inside the digitized image.
This preliminary provisional setup holds the invoice's information and converts it into JSON and CSV configurations. This model can be helpful in divination based on knowledge engineering and qualitative analysis in the nearing future.
The existing system contains data extraction and nothing more. In a paramount manner, image pre-processing techniques like black and white, inverted, noise removal, grayscale, thick font, and canny are applied to escalate the quality of the picture.In the very next step, three different OCRs are used: Keras OCR, Easy OCR, and Tesseract OCR, out of which Tesseract OCR gives the precise result.

Keywords: Optical Character Recognition (OCR), Computer Vision,Image Pre-processing,Data Extraction,Invoice Processing


PDF | DOI: 10.17148/IJIREEICE.2025.13435

Open chat