如何使用ctypes和tesseract 3.0.2识别不是文件名的数据? [英] How to recognize data not filename using ctypes and tesseract 3.0.2?

查看:82
本文介绍了如何使用ctypes和tesseract 3.0.2识别不是文件名的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 ctypes tesseract 3.0.2 编写了一个代码段,引用了示例

I write a snippet using ctypes and tesseract 3.0.2 referring to the example:

import ctypes
from PIL import Image


libname = '/opt/tesseract/lib/libtesseract.so.3.0.2'
tesseract = ctypes.cdll.LoadLibrary(libname)
api = tesseract.TessBaseAPICreate()

rc = tesseract.TessBaseAPIInit3(api, "", 'eng')
filename = '/opt/ddl.ddl.exp654.png'

text_out = tesseract.TessBaseAPIProcessPages(api, filename, None, 0)
result_text = ctypes.string_at(text_out)
print result_text

它传递文件名作为参数,我不知道要调用 API 传递原始数据,例如:

It passes filename as a parameter, I have no idea to call which method in API to pass the raw data like:

tesseract.TessBaseAPIWhichMethod(api, open(filename).read())


推荐答案

我不能肯定地说,但我不认为您可以将复杂的python对象传递给该特定的API,它不知道如何处理它们。最好的选择是查看类似 http://code.google.com/ p / python-tesseract / ,它将允许您使用文件缓冲区

I can't say for sure but I don't think you can pass complex python objects to that specific API, it won't know how to handle them. Your best bet would to be to look at a wrapper like http://code.google.com/p/python-tesseract/ which will allow you to use file buffers

import tesseract
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_AUTO)

mImgFile = "eurotext.jpg"
mBuffer=open(mImgFile,"rb").read()
result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api) #YAY for buffers.
print "result(ProcessPagesBuffer)=",result

编辑

http://code.google.com/p/python-tesseract/source/browse/python-tesseract-0.7.4/debian /python-tesseract/usr/share/pyshared/tesseract.py 可能会为您提供所需的见识。

http://code.google.com/p/python-tesseract/source/browse/python-tesseract-0.7.4/debian/python-tesseract/usr/share/pyshared/tesseract.py might provide you with the insight that you need.

...

如果您不介意更换时会发生什么,

Acutally if you don't mind what happens when you replace

text_out = tesseract.TessBaseAPIProcessPages(api, filename, None, 0)

with

text_out = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api)

这篇关于如何使用ctypes和tesseract 3.0.2识别不是文件名的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆