烦人的python tesseract错误打开数据文件时出错./tessdata/eng.traineddata [英] Annoying python tesseract error Error opening data file ./tessdata/eng.traineddata

查看:22
本文介绍了烦人的python tesseract错误打开数据文件时出错./tessdata/eng.traineddata的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了这个错误,它让我对 tesseract 的 python 包装器有点疯狂,这是一个名为 tesseract 的 python 模块.

I'm bumping into this error that's driving me a little bit crazy with the python wrapper for tesseract which is a python module called tesseract.

这是我尝试运行的 python 代码:

Here's the python code I am trying to run :

img = cv2.imread(image, 0)
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)
tesseract.SetCvImage(img,api)
url = api.GetUTF8Text()
conf=api.MeanTextConf()
print('Extracted URL : ' + url)
api.End()

这就是我得到的:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

我不明白为什么要这样做,因为我已将 TESSDATA_PREFIX 环境变量正确设置为我的 tesseract 安装的正确路径(带有尾部斜杠).

I don't understand why it is doing this since I have the TESSDATA_PREFIX env variable correctly set to the correct path to my tesseract installation (with the trailing slash).

当我尝试直接从 powershell 运行 Tesseract 时(顺便说一句,我在 Windows 7 上),执行以下操作:

When I try to run Tesseract directly from powershell (I'm on windows 7 btw), by doing:

 tesseract.exe .\data\test.tif -psm 7 out

它就像一个魅力!此外,当我在 python 脚本中使用 Popen 调用 Tesseract 时,它工作正常,但我不喜欢无法直接从标准输出获取 OCR 文本的想法.事实上,除了为 Tesseract 提供一个输出文件名,然后打开并读取该文件之外,似乎没有其他选择.我觉得处理临时文本文件只是为了获得 OCR 的输出会非常糟糕......

it works like a charm ! Also when I call Tesseract with Popen in my python script it works fine but I don't like the idea of me not being able to grab the OCR'd text directly from stdout. Indeed, there seems to be no other choice than providing Tesseract with an output filename and then to fopen and read from that file. I feel it's going to be pretty awful to deal with temporary text files just to get the output of the OCR...

帮助?

推荐答案

api.Init 的第一个参数应该是 TESSDATA_PREFIX.

The first parameter to api.Init should be TESSDATA_PREFIX.

这篇关于烦人的python tesseract错误打开数据文件时出错./tessdata/eng.traineddata的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆