基于多个 OCR 软件包的输出改进 OCR 结果的软件 [英] Software to Improve OCR Results Based on Output from Multiple OCR Software Packages

查看:17
本文介绍了基于多个 OCR 软件包的输出改进 OCR 结果的软件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有现有的商业或学术软件可以

Is there an already-existing piece of commercial or academic software that can

  • 覆盖来自多个 OCR 包(Abbyy FineReader、Adobe Acrobat Professional、ReadIris 等)的结果
  • 根据从多个来源积累的知识提供全自动改进
  • 允许在运行时使用额外的外部工具设置(字典、批量网络/本地语料库查找等)

?

注意:我已经有内部解决方案来可视化单一来源的结果,所以如果没有这样的软件可用,我不介意开发自己的 :) 合作咨询也欢迎!
(来源:sourceforge.net)

Note: I already have in-house solutions to visualize results from single sources, so in case there is no such software obtainable, I would not mind developing my own : ) Inquiries for cooperation would then also be most welcome!
(source: sourceforge.net)

推荐答案

在多个 OCR 引擎之间使用投票的想法并不新鲜.问题是它并没有真正起作用.如果它们本质上是正交的简单分类器,那么可能会起作用,然后您将结合他们的投票并改进结果.但它们都是非常复杂的软件,使用非常相似的一组众所周知的方法,差异很小,但可能以不同的方式组合它们,有些实现更好,有些则更差.

The idea to use voting between several OCR engines is not new. The thing is that it is not really working. What probably would work if they would be simple classifiers ortogonal by thier nature, then you would combine their votes and improve results. But they all are very complicated software, using quite similar set of well-known approches with little variances, but probably combining them different way and some implementations are better and some are worse.

经验表明,当您结合多种 OCR 技术时,最佳决策规则是依赖最准确的结果,而只考虑其他技术.根据我的经验(我为 ABBYY 工作),ABBYY OCR 绝对是您提到的最准确的.

Experience shows that when you combine several OCR technologies, the best decision rule is to rely on results of most accurate one and just ingore others. From my experience (I work for ABBYY), ABBYY OCR is definetely the most accurate from ones you mentioned.

据我所知,使用投票的唯一原因是当您想要交叉检查可疑"字符并在要求 100% 准确率时将它们发送到手动验证.使用这种方法可以增加要验证的字符数,但减少遗漏错误字符的可能性.

As far as I know, the only reason to use voting is when you want cross-check "suspicious" characters and send them to manual verification if 100% accuracy is a requirement. Using this approach you increase number of characters to verify, but reduce possibility to miss wrong character.

这篇关于基于多个 OCR 软件包的输出改进 OCR 结果的软件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆