Project information
- Category: GIS Development
- Skills Used: OpenCV, Pillow, Pytesseract, Face-detection, OCR
- Learn More at: OCR and Face-detection on Newspaper
Summary:
This project combined Pillow, OpenCV, and Pytesseract to create an application that can search through
a zip file containing scanned newspapers for a word and return all the faces on the page containing the
given word.
First, all the images of newspaper pages from the zipfile were extracted using Zipfile library.
Then, using Pytesseract library, the scanned pages were converted to text.
Using OpenCV, all the faces for a page were identified.
A function was defined to take user defined text input as a search string and search for that text in all
the text generated from OCR to determine the pages on which the text is found. Then all the images from those
pages are pasted using Pillow library functions, for each page and then are displayed.