如何获取由Tesseract OCR检索的字母坐标 [英] How to get the letter coordinate retrieved by Tesseract ocr

查看:1561
本文介绍了如何获取由Tesseract OCR检索的字母坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在python中处理tesseract,以完成简单的工作: -打开图片 -运行OCR -获取字符串 -获取角色坐标

I'm trying to handle tesseract in python to just do simple job: - open a picture - run ocr - get the string - get the characters coordinates

最后一个是我的痛苦!

这是我的第一个代码:

import tesseract
import glob
import cv2

api = tesseract.TessBaseAPI()
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZéèô%")
api.SetPageSegMode(tesseract.PSM_AUTO)

imagepath = "C:\\Project\\Bob\\"
imagePathList = glob.glob(imagepath + "*.jpg")

for image in imagePathList:
    mBuffer=open(imagePathList[10],"rb").read()
    result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)
    img = cv2.imread(image)
    cv2.putText(img,result,(20,20), cv2.FONT_HERSHEY_PLAIN, 1.0,(0,255,0))       
    cv2.imshow("Original",img)
    cv2.waitKey()

由于我的图片具有各种布局,并且在不同位置具有不同的单词,所以我想为每个字符提供一个框.

As my picture get various layouts, with different words at different positions, I would like to get a box for every char.

我看过谈论: -api.getBoxText -Hocr

I have seen talking about: - api.getBoxText - Hocr

但是没有找到在Python中实现它的方法.

But no way has been found to implement it in Python.

推荐答案

tesserocr 提供了以下功能:访问几乎所有的tesseract API功能.这是您可能想要的示例:

tesserocr provides the capability to access pretty much all of tesseract's API functionality. Here's an example that might be what you want:

from PIL import Image
from tesserocr import PyTessBaseAPI, RIL

image = Image.open('/usr/src/tesseract/testing/phototest.tif')
with PyTessBaseAPI() as api:
    api.SetImage(image)
    boxes = api.GetComponentImages(RIL.TEXTLINE, True)
    print 'Found {} textline image components.'.format(len(boxes))
    for i, (im, box, _, _) in enumerate(boxes):
        # im is a PIL image object
        # box is a dict with x, y, w and h keys
        api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
        ocrResult = api.GetUTF8Text()
        conf = api.MeanTextConf()
        print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
               "confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)

您还可以访问其他API方法,例如GetHOCRTextGetBoxText.

You can also access other API methods such as GetHOCRText and GetBoxText among others.

但是,尽管用户已成功编译,但它现在仅支持* nix系统Windows 上提供它,并提供二进制文件(如果您想尝试一下的话).

However, right now it only supports *nix systems although a user successfully compiled it on Windows and provided binaries if you'd like to give it a try.

免责声明:此处为tesserocr作者.

Disclaimer: tesserocr author here.

这篇关于如何获取由Tesseract OCR检索的字母坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆