使用Tesseract的页面布局分析? [英] Page layout analysis using Tesseract?
本文介绍了使用Tesseract的页面布局分析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Tesseract 3 能够执行页面布局分析。但是,我找不到任何示例代码或文档如何使用库用于此目的。我希望有人在这里可以解释如何对图像执行布局分析,以及如何解析生成的数据。
Tesseract 3 is able to perform page layout analysis. However, I couldn't find any sample code or documentation on how to use the library for such purposes. I hope someone here can explain how to perform layout analysis on an image and how to parse the resulting data.
推荐答案
Tesseract可以指定一个页模式参数( -psm
)可以有以下值:
Tesseract can be given a page mode parameter (-psm
) which can have the following values:
-
0
=方向和脚本检测)。 -
1
=使用OSD自动分页。 -
2
=自动分页,但不显示OSD或OCR -
3
=分割,但没有OSD。 (默认) -
4
=假设单列的文字大小可变。 -
5
=假设一个垂直对齐文本的统一块。 -
6
=假设一个统一的文字区块。 -
7
=将图片视为单一文字行。 -
8
=将图片视为单个字词。 -
9
=将图片视为圈子中的单个字词。 -
10
=将图片视为单个字符。 / li>
0
= Orientation and script detection (OSD) only.1
= Automatic page segmentation with OSD.2
= Automatic page segmentation, but no OSD, or OCR3
= Fully automatic page segmentation, but no OSD. (Default)4
= Assume a single column of text of variable sizes.5
= Assume a single uniform block of vertically aligned text.6
= Assume a single uniform block of text.7
= Treat the image as a single text line.8
= Treat the image as a single word.9
= Treat the image as a single word in a circle.10
= Treat the image as a single character.
示例:
tesseract image.tif image.txt -l eng -psm 0
但是,我不确定是否可能以在独立模式下使用布局分析。
However, I am not sure that it is possible to use the layout analysis in standalone mode.
这篇关于使用Tesseract的页面布局分析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文