使用 python-tesseract 获取已识别单词的边界框 [英] Getting the bounding box of the recognized words using python-tesseract
问题描述
我正在使用 python-tesseract 从图像中提取单词.这是 tesseract 的 Python 包装器,它是一个 OCR 代码.
我正在使用以下代码来获取单词:
导入tesseractapi = tesseract.TessBaseAPI()api.Init(".","eng",tesseract.OEM_DEFAULT)api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")api.SetPageSegMode(tesseract.PSM_AUTO)mImgFile = "test.jpg"mBuffer=open(mImgFile,"rb").read()结果 = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)打印 "result(ProcessPagesBuffer)=",result
这仅返回图像中的单词,而不返回它们的位置/大小/方向(或者换句话说,包含它们的边界框).我想知道是否有任何方法可以得到它
使用 pytesseract.image_to_data()
导入pytesseract从 pytesseract 导入输出导入 cv2img = cv2.imread('image.jpg')d = pytesseract.image_to_data(img, output_type=Output.DICT)n_boxes = len(d['level'])对于我在范围内(n_boxes):(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)cv2.imshow('img', img)cv2.waitKey(0)
在pytesseract.image_to_data()
返回的数据中:
left
是离边界左上角的距离框,到图像的左边框.top
是离边界框左上角的距离,到图像的顶部边框.width
和height
是边界框的宽度和高度.conf
是模型对该边界框内单词的预测的置信度.如果conf
为 -1,则表示相应的边界框包含一个文本块,而不仅仅是一个单词.
pytesseract.image_to_boxes()
返回的边界框包含字母,所以我相信 pytesseract.image_to_data()
是您要找的.p>
I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.
I am using the following code for getting the words:
import tesseract
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)
mImgFile = "test.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result
This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well
Use pytesseract.image_to_data()
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
Among the data returned by pytesseract.image_to_data()
:
left
is the distance from the upper-left corner of the bounding box, to the left border of the image.top
is the distance from the upper-left corner of the bounding box, to the top border of the image.width
andheight
are the width and height of the bounding box.conf
is the model's confidence for the prediction for the word within that bounding box. Ifconf
is -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.
The bounding boxes returned by pytesseract.image_to_boxes()
enclose letters so I believe pytesseract.image_to_data()
is what you're looking for.
这篇关于使用 python-tesseract 获取已识别单词的边界框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!