Abstract:
Computers are playing an important role in automation of various process and industries and Digital Data Archiving is one of them where in we tend to improve the working of office through some software process. Each Day numerous letters, visting cards and documents are received and generated in offices and then they are stored in files and folders in offices. To search any document takes a lot of time andthe things go more worse when we don’t remember the date or heading of this document / letter but just the sender’s name or just a line from the text of the document.. There is also chances of misplacement of these documents. So it is the need of the hour to built an application that will have capabilities to scan documents and store them in image format, extract the text out these images and store that text in database. What this will achieve is to allow an efficient and easy search of documents by just typing sender’s name or the company name.

Keywords:
Optical character recognition(OCR), Digital archive, Scanning, emguCV, WIA lib (window image acquisition library), MODI lib (microsoft office document imaging library), tesseract-ocr