Tesseract混淆了两个数字 [英] Tesseract confuses two numbers

查看：87 发布时间：2020/5/19 19:29:19 ocr tesseract

本文介绍了Tesseract混淆了两个数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个应用程序来扫描图像中的数字.

I'm writing an application to scan numbers from an image.

数字使用OCR-B字体，并且可能还包含+和>字符.

The numbers are using the OCR-B font and may also contain + and > characters.

这是我的原始图片:

即使将字符集限制为提到的字符，使用Tesseract的扫描也不是很好.由于找不到用于Tesseract的OCRB培训文件，因此我决定自己进行培训.

The scans using Tesseract weren't very good, even when limiting the character set to the mentioned characters. As I didn't find any OCRB training files for Tesseract, I decided to train it myself.

我创建了此培训图片，并从中创建了一个盒子文件.盒子文件正确，所有字母正确匹配.

I created this training image and made a box file from it. The box file is correct, all letters are matched correctly.

然后，我执行了此处描述的所有步骤，以创建其他必要的步骤文件.

Then I did all steps described here to create the other necessary files.

使用这个经过新训练的OCR-B tessdata集，我在源图像上获得了不错的结果，但有一个小错误:所有1都误认为8，反之亦然.用于处理图像的命令是

Using this newly trained OCR-B tessdata-set, I get pretty good results on the source image, with one little bug: All 1s are mistaken for 8s and vice-versa. The command used to process the image was

$ tesseract esr2c.tif ocrb-esr2c -l ocrb

，源图像的输出为

0800000001456> 8 00000195731208 8 01050008 023+ 08 0301226> 20

如果交换所有1和8并将其与源图像进行比较，则输出将是正确的(除了最后两个字母，我可以忽略).

If you swap all 1s and 8s and compare it to the source image, the output would be correct (except for the last two letters which I can ignore).

这怎么可能发生?在培训过程中我是否犯了一些错误?我该如何解决?

How could this happen? Did I do some mistake in the training process? How can I fix it?

Tesseract混淆了两个数字 [英] Tesseract confuses two numbers

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Tesseract混淆了两个数字 [英] Tesseract confuses two numbers

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭