使用 python-tesseract 获取已识别单词的边界框 [英] Getting the bounding box of the recognized words using python-tesseract

查看:48
本文介绍了使用 python-tesseract 获取已识别单词的边界框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 python-tesseract 从图像中提取单词.这是 tesseract 的 Python 包装器,它是一个 OCR 代码.

我正在使用以下代码来获取单词:

导入tesseractapi = tesseract.TessBaseAPI()api.Init(".","eng",tesseract.OEM_DEFAULT)api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")api.SetPageSegMode(tesseract.PSM_AUTO)mImgFile = "test.jpg"mBuffer=open(mImgFile,"rb").read()结果 = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)打印 "result(ProcessPagesBuffer)=",result

这仅返回图像中的单词,而不返回它们的位置/大小/方向(或者换句话说,包含它们的边界框).我想知道是否有任何方法可以得到它

解决方案

使用 pytesseract.image_to_data()

导入pytesseract从 pytesseract 导入输出导入 cv2img = cv2.imread('image.jpg')d = pytesseract.image_to_data(img, output_type=Output.DICT)n_boxes = len(d['level'])对于我在范围内(n_boxes):(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)cv2.imshow('img', img)cv2.waitKey(0)

pytesseract.image_to_data()返回的数据中:

  • left 是离边界左上角的距离框,到图像的左边框.
  • top 是离边界框左上角的距离,到图像的顶部边框.
  • widthheight 是边界框的宽度和高度.
  • conf 是模型对该边界框内单词的预测的置信度.如果 conf 为 -1,则表示相应的边界框包含一个文本块,而不仅仅是一个单词.

pytesseract.image_to_boxes() 返回的边界框包含字母,所以我相信 pytesseract.image_to_data() 是您要找的.

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.

I am using the following code for getting the words:

import tesseract

api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)

mImgFile = "test.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
print "result(ProcessPagesBuffer)=",result

This returns only the words and not their location/size/orientation (or in other words a bounding box containing them) in the image. I was wondering if there is any way to get that as well

解决方案

Use pytesseract.image_to_data()

import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('image.jpg')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(d['level'])
for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.waitKey(0)

Among the data returned by pytesseract.image_to_data():

  • left is the distance from the upper-left corner of the bounding box, to the left border of the image.
  • top is the distance from the upper-left corner of the bounding box, to the top border of the image.
  • width and height are the width and height of the bounding box.
  • conf is the model's confidence for the prediction for the word within that bounding box. If conf is -1, that means that the corresponding bounding box contains a block of text, rather than just a single word.

The bounding boxes returned by pytesseract.image_to_boxes() enclose letters so I believe pytesseract.image_to_data() is what you're looking for.

这篇关于使用 python-tesseract 获取已识别单词的边界框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆