使用 python-tesseract 获取已识别单词的边界框 [英] Getting the bounding box of the recognized words using python-tesseract

查看：48 发布时间：2021/12/18 11:11:05 python image-processing ocr tesseract python-tesseract

本文介绍了使用 python-tesseract 获取已识别单词的边界框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 python-tesseract 从图像中提取单词.这是 tesseract 的 Python 包装器，它是一个 OCR 代码.

我正在使用以下代码来获取单词:

导入tesseractapi = tesseract.TessBaseAPI()api.Init(".","eng",tesseract.OEM_DEFAULT)api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")api.SetPageSegMode(tesseract.PSM_AUTO)mImgFile = "test.jpg"mBuffer=open(mImgFile,"rb").read()结果 = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)打印 "result(ProcessPagesBuffer)=",result

这仅返回图像中的单词，而不返回它们的位置/大小/方向(或者换句话说，包含它们的边界框).我想知道是否有任何方法可以得到它

解决方案

使用 pytesseract.image_to_data()

导入pytesseract从 pytesseract 导入输出导入 cv2img = cv2.imread('image.jpg')d = pytesseract.image_to_data(img, output_type=Output.DICT)n_boxes = len(d['level'])对于我在范围内(n_boxes):(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)cv2.imshow('img', img)cv2.waitKey(0)

在pytesseract.image_to_data()返回的数据中:

left 是离边界左上角的距离框，到图像的左边框.
top 是离边界框左上角的距离，到图像的顶部边框.
width 和 height 是边界框的宽度和高度.
conf 是模型对该边界框内单词的预测的置信度.如果 conf 为 -1，则表示相应的边界框包含一个文本块，而不仅仅是一个单词.

pytesseract.image_to_boxes() 返回的边界框包含字母，所以我相信 pytesseract.image_to_data() 是您要找的.

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.

I am using the following code for getting the words:

import tesseract

api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)

mImgFile = "test.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result

This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well

解决方案

Use pytesseract.image_to_data()

import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.waitKey(0)

Among the data returned by pytesseract.image_to_data():

left is the distance from the upper-left corner of the bounding box, to the left border of the image.
top is the distance from the upper-left corner of the bounding box, to the top border of the image.
width and height are the width and height of the bounding box.
conf is the model's confidence for the prediction for the word within that bounding box. If conf is -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.

The bounding boxes returned by pytesseract.image_to_boxes() enclose letters so I believe pytesseract.image_to_data() is what you're looking for.

这篇关于使用 python-tesseract 获取已识别单词的边界框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 python-tesseract 获取已识别单词的边界框 [英] Getting the bounding box of the recognized words using python-tesseract

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 python-tesseract 获取已识别单词的边界框 [英] Getting the bounding box of the recognized words using python-tesseract

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭