有谁知道pytesseract的image_to_data,image_to_osd方法的输出的含义? [英] Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

查看:58
本文介绍了有谁知道pytesseract的image_to_data,image_to_osd方法的输出的含义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 pytesseract 从图像中提取数据.该模块具有 image_to_dataimage_to_osd 方法.这两种方法提供了大量信息(TextLineOrder、WritingDirection、ScriptDetection、Orientation 等...)作为输出.

I'm trying to extract the data from image using pytesseract. This module has image_to_data, image_to_osd methods. These two methods provides lot of info(TextLineOrder, WritingDirection, ScriptDetection, Orientation etc...) as output.

下图是image_to_data方法的输出.这些列(level、block_num、par_num、line_num、word_num)的值是什么意思?

Below image is the output of image_to_data method. what does values of these columns(level, block_num, par_num, line_num, word_num) meaning?

image_to_osd 的输出如下所示.这里的每个术语是什么意思?

Output of image_to_osd looks as below. What is the meaning each term in this?

页码:0以度为单位的方向:0旋转:0方向置信度:16.47脚本:拉丁文脚本置信度:4.00

Page number: 0 Orientation in degrees: 0 Rotate: 0 Orientation confidence: 16.47 Script: Latin Script confidence: 4.00

我参考了文档,但没有得到有关这些参数的任何信息.

I refered docs but I did not get any info regarding these parameters.

推荐答案

my_image.jpg

例如,在下面的代码中使用 image_to_data 测试 my_image.jpg,我们将得到类似 results.png 的结果.

For example, Test the my_image.jpg with image_to_data in the following code, we will get the results like the results.png.

results.png

  • level = 1/2/3/4/5,当前物品的等级.

  • level = 1/2/3/4/5,the level of current item.

page_num:当前项目的页面索引.在大多数情况下,一张图片只有一页.

page_num: the page index of the current item. In most instances, a image only has one page.

block_num:当前项的块项.当tesseract OCR图像,它会根据 PSM 将图像分成几个块参数和一些规则.一行中的单词通常在一个块中.

block_num: the block item of the current item. when tesseract OCR Image, it will split the image into several blocks according the PSM parameters and some rules. The words in a line often in a block.

par_num:当前项目的段落索引.这是页面分析结果.line_num:当前项目的行索引.这是页面分析结果.word_num:一个块中的词索引.

par_num: The paragraph index of the current item. It is the page analysis results. line_num: The line index of the current item. It is the page analysis results. word_num: The word index in one block.

line_num:当前项目的行索引.这是页面分析结果.

line_num: The line index of the current item. It is the page analysis results.

word_num:一个块中的词索引.

word_num: The word index in one block.

left/top/width/height:左上角坐标和宽度和当前词的高度.

left/top/width/height:the top-left coordinate and the width and height of the current word.

conf:当前词的置信度,范围是-1~100.. -1 表示这里没有文字.这100 是最高值.

conf: the confidence of the current word, the range is -1~100.. The -1 means that there is no text here. The 100 is the highest value.

text:ocr 结果这个词.

text: the word ocr results.

image_to_osd 的结果含义:

The meaning of the results from image_to_osd:

  • 页码:当前项目的页码.在大多数情况下,一张图片只有一页.

  • Page number: the page index of the current item. In most instances, a image only has one page.

Orientation in degree:当前图片中文字相对于其阅读角度顺时针旋转的角度,取值范围为[0, 270, 180, 90].

Orientation in degrees: the clockwise rotation angle of the text in the current image relative to its reading angle, the value range is [0, 270, 180, 90].

Rotate:记录当前图片中文字转成可读的角度,相对于当前图片顺时针旋转,取值范围为[0, 270, 180, 90].对 [Orientation in degree] 值的补充.

Rotate: Record the angle at which the text in the current image is to be converted into readable, relative to the clockwise rotation of the current image, the value range is [0, 270, 180, 90]. Complementary to the [Orientation in degrees] value.

Orientation confidence:表示当前[Orientation in degree]和[Rotate]检测值的置信度.置信度越大,测试结果越可信,但目前尚未发现其取值范围的解释.

Orientation confidence:Indicates the confidence of the current [Orientation in degrees] and [Rotate] detection values. The greater the confidence, the more credible the test result, but no explanation of its value range has been found so far.

Script:当前图片中文字的编码类型.

Script: The encoding type of the text in the current picture.

脚本置信度:当前图像中文本编码类型的置信度.

Script confidence: The confidence of the text encoding type in the current image.

from pytesseract import 输出导入 pytesseract导入cv2

from pytesseract import Output import pytesseract import cv2

image = cv2.imread("my_image.jpg")

#swap color channel ordering from BGR (OpenCV’s default) to RGB (compatible with Tesseract and pytesseract).
# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
 
pytesseract.pytesseract.tesseract_cmd = r'C:\mypath\tesseract.exe'
custom_config = r'-c tessedit_char_whitelist=0123456789 --psm 6'
results = pytesseract.image_to_data(rgb, output_type=Output.DICT,lang='eng',config=custom_config)
print(results)

这篇关于有谁知道pytesseract的image_to_data,image_to_osd方法的输出的含义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆