基于多个OCR软件包的输出来改善OCR结果的软件 [英] Software to Improve OCR Results Based on Output from Multiple OCR Software Packages

查看:88
本文介绍了基于多个OCR软件包的输出来改善OCR结果的软件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否已经存在可以使用的商业或学术软件

Is there an already-existing piece of commercial or academic software that can

  • 多个OCR软件包(Abbyy FineReader,Adobe Acrobat Professional,ReadIris等)的叠加结果
  • 基于来自多个来源的积累的知识提供全自动的改进
  • 允许在运行时使用其他外部工具设置(字典,批处理Web/本地语料库查询等)

?

注意:我已经有了内部解决方案来可视化单个来源的结果,因此,如果没有这样的软件,我不介意开发自己的软件:)也非常欢迎!
(来源: sourceforge.net )

Note: I already have in-house solutions to visualize results from single sources, so in case there is no such software obtainable, I would not mind developing my own : ) Inquiries for cooperation would then also be most welcome!
(source: sourceforge.net)

推荐答案

在多个OCR引擎之间使用投票的想法并不新鲜.问题是它实际上并没有工作.如果它们是简单的分类器,或者按其性质正交,那么可能会起作用,然后您将它们的票合并起来并改善结果.但是它们都是非常复杂的软件,使用了一组非常相似的众所周知的方法,几乎​​没有差异,但是可能以不同的方式组合它们,有些实现更好,有些则更糟.

The idea to use voting between several OCR engines is not new. The thing is that it is not really working. What probably would work if they would be simple classifiers ortogonal by thier nature, then you would combine their votes and improve results. But they all are very complicated software, using quite similar set of well-known approches with little variances, but probably combining them different way and some implementations are better and some are worse.

经验表明,当您结合使用多种OCR技术时,最好的决策规则是依靠最准确的一种,而只吸收其他一种.根据我的经验(我在ABBYY工作),ABBYY OCR绝对是您提到的最准确的.

Experience shows that when you combine several OCR technologies, the best decision rule is to rely on results of most accurate one and just ingore others. From my experience (I work for ABBYY), ABBYY OCR is definetely the most accurate from ones you mentioned.

据我所知,使用投票的唯一原因是当您需要交叉检查可疑"字符并将其发送给手动验证时(如果要求100%的准确性).使用这种方法,您可以增加要验证的字符数,但可以减少错过错误字符的可能性.

As far as I know, the only reason to use voting is when you want cross-check "suspicious" characters and send them to manual verification if 100% accuracy is a requirement. Using this approach you increase number of characters to verify, but reduce possibility to miss wrong character.

这篇关于基于多个OCR软件包的输出来改善OCR结果的软件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆