提高扫描文档的OCR准确性 [英] Improve OCR accuracy from scanned documents

查看:63
本文介绍了提高扫描文档的OCR准确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用标准的Brother A3多功能打印机扫描许多A3文档,然后使用FineReader Pro对图像进行OCR处理.

I'm scanning a lot of A3 documents using a standard Brother A3 Multifunction and then use FineReader Pro for OCR'ing the images.

但是,我在识别的字符中遇到很多错误,并且遇到了许多非字母数字的奇怪字符.

However, I'm getting a lot of errors in the characters recognized, and lots of non-alphanumeric strange characters.

有人可以给我任何以编程方式提高OCR准确性的提示吗?可以对扫描的图像进行预处理,也可以对识别的文本进行后处理?

Can someone give me any tips for programmatically improving the OCR accuracy, either pre-processing on the scanned images, or post-processing on the recognized text?

查找示例pdf .其中包括一些样本图片,我得出的结果最差.

Find a sample pdf. It includes some sample images from which I get the poorest results.

推荐答案

您是否可以将示例图像张贴在某个地方,然后我们可以快速告诉您造成大多数问题的原因.FineReader是目前最好的OCR引擎之一,因此绝对有理由导致您获得差劲的结果.

Do you have a sample image you can post somewhere then we can quickly tell you what is causing most of your problems. FineReader is one of the better OCR engines out there so there are definitely reasons why you are getting poor results.

这可能与对比度和阈值设置不佳,图像歪斜,扫描仪中的滚轴脏污,背景复杂且有色,背景抖动,字体大小太小,扫描dpi太低等有关...

It could be related to poor contrast and threshold settings, image skewing, dirty rollers in the scanner, complex and coloured backgrounds, dithered backgrounds, font sizes too small, scanning dpi being too low etc...

看到附件的图像后,有一些小问题.

After seeing the attached image there are a few small issues.

  1. 背景页面上有很多脏斑点.FineReader似乎在图像上做得很合理.
  2. 有些偏斜,但这并不是造成问题的原因.
  3. FineReader与用于列标题的Bold高Arial类型字体混淆.
    4一个大问题似乎是页面底部的对比度差且图像模糊.这似乎是扫描仪的问题,但可能是由于打印问题所致.

印刷效果很差,我想这是报纸的扫描件.您的大多数错误是由于扫描问题引起的,因此很难以编程方式改善结果.

The printing is quite poor and I am guessing it is a scan from a newspaper. Most of your errors are due to scanning issues so it would be hard to programmatically improve the results.

首先,我将尝试使用稍高的分辨率以灰度扫描图像,看看是否有帮助.FineReader适用于灰度图像.如果需要黑白图像,请查看扫描仪驱动程序是否包含动态阈值设置并将其打开.

Firstly, I would try scanning the image in grayscale using a slightly higher resolution and see if that helps. FineReader works well with grayscale images. If you have to have a B/W image then see if the scanner driver includes a setting for dynamic thresholding and turn it on.

对于任何OCR引擎而言,您的图像都不是一件容易的事.如果可以改善扫描效果,将会得到更好的结果.第3页的右下角有很多杂音.

Your images would not be an easy task for any OCR engine. You will get better results if you can improve the scanning. Page 3 has a lot of noise in the bottom right corner.

您正在使用哪个版本的FineReasder?FR10可能会比以前的版本提供更好的结果.

What version of FineReasder are you using ? FR10 would probably give better results than previous versions.

这篇关于提高扫描文档的OCR准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆