Sunday 13 September 2020

Extract Text From Image Using Python



We are going to use pytesseract and pillow library to work on this project!

Before you start coding, you need to complete three tasks:

1. Click on the link below and install tesseract-OCR

After you install the setup successfully, take a note of where you are saving the file because we need that path in our code.

2. Install pytesseract by using command : pip install pytesseract

3. Install pillow by using command : pip install pillow 


Source Code:

import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe''logo.png')
print("Extracted Data is: \n ", text) 

Notice that if you want to add full path, you need to add double slash (\\) instead of single slash (\) in you path

For Example: If your path looks like this: D:\Python_Program\my_Image.jpg

Then replace \ with \\

something like : D:\\Python_Program\\my_Image.jpg


Add  alphabet 'r' before your string
example:     r'C:\Program Files\Tesseract-OCR\tesseract.exe'

and then you are good to go!


