tesseract (v3.03) 输出为 PDF [英] tesseract (v3.03) output as PDF

查看:43
本文介绍了tesseract (v3.03) 输出为 PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么会返回这个错误?

Why is this error returned?

root@amd-3700-2gb ~/ocr_test # tesseract -l dan pdf.png out pdf
Tesseract Open Source OCR Engine v3.03 with Leptonica
Error opening data file /usr/local/share/tessdata/osd.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load

语言列表

root@amd-3700-2gb ~/ocr_test # tesseract --list-langs
List of available languages (3):
eng
dan
dan-frak

输出为txt

这工作正常并将文本输出到 out.txt

tesseract -l dan pdf.png out

输出PDF

这会创建 out.pdf 但也会重新调整提到的错误并且 PDF 中的可搜索文本没有意义

Output PDF

This creates out.pdf but also retuns the error mentioned and the searchable text in the PDF doesn't make sense

tesseract -l dan pdf.png out pdf

推荐答案

错误信息很明确:需要 osd.traineddata 文件.您可以安装或下载 Orientation &来自 https://github.com/tesseract-ocr/tessdata 的 Tesseract 脚本检测数据.

The error message is clear: it needs osd.traineddata file. You can install or download Orientation & Script Detection Data for Tesseract from https://github.com/tesseract-ocr/tessdata.

这篇关于tesseract (v3.03) 输出为 PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆