Tesseract认为我的1是7 [英] Tesseract thinks my 1's are 7's

查看:89
本文介绍了Tesseract认为我的1是7的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎是ocr的常见问题.有没有办法告诉tesseract我的1实际上是1?

It seems like this is probably a common issue with ocr. Is there a way to tell tesseract that my 1's are actually 1's?

希望不会在此过程中将我的7变成1.

Hopefully without changing my 7's into 1's in the process.

注意:这些是扫描的文档,我不知道使用什么字体.

Note: these are scanned documents and I have no idea what font was used.

推荐答案

如果"tesseract"是可训练的,请尝试手动对字体进行训练.它应该可以解决问题.

if "tesseract" is trainable, try to train it on the font manually. It should solve the problem.

还有另一种可能的解决方案.在"tesseracting"之后制作一个小型的检定模块.对于所有1s和7s,请使用基于强度的方法仔细检查它们.例如,尝试在其上找到拐角(特征点),然后将KLT与1和7模板一起应用,看看哪个获得了更好的跟踪结果.这种方法比较昂贵,但是由于您只能在2个模板上尝试使用,而且体积很小,因此我认为它不会降低性能.

There is another possible solution. Make a small valdiation module after "tesseracting". For all 1s and 7s, double check them using intensity based method. For example try to find corners(feature points) on it and apply KLT with 1 and 7 template and see which one got more positive tracking result. This method is costy but since you will try it on just 2 templates and so small, I do not think it gonna be a big performance decreasing.

如果两种解决方案都不可行,请尝试使用后处理解决.例如,如果该年龄段是学生年龄,则不会是78岁,而是18岁,依此类推.但是,这种方法太糟糕了,根本无法解决.但是当不可能有其他解决方案时,您必须做类似的事情.

if both solution are not possible , try to solve it using post-processing. For example, if it is a student age it would not be 78, it is 18 and so on. However this method is so bad and not a solution at all. but when no other solution is possible you have to do something like it.

这篇关于Tesseract认为我的1是7的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆