图像处理，以提高tesseract OCR的准确性 [英] image processing to improve tesseract OCR accuracy

查看：87 发布时间：2018/7/30 15:36:31 image-processing ocr tesseract

本文介绍了图像处理，以提高tesseract OCR的准确性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在使用tesseract将文档转换为文本。文档的质量范围非常广泛，我正在寻找有关哪种图像处理可能会改善结果的提示。我注意到高度像素化的文本 - 例如由传真机生成的文本 - 对于tesseract来说特别难以处理 - 可能是角色的所有锯齿状边缘都会混淆形状识别算法。

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for example that generated by fax machines - is especially difficult for tesseract to process - presumably all those jagged edges to the characters confound the shape-recognition algorithms.

哪种图像处理技术可以提高准确度？我一直在使用高斯模糊来平滑像素化图像并看到一些小的改进，但我希望有一种更具体的技术可以产生更好的结果。假设一个过滤器被调整为黑白图像，可以平滑不规则的边缘，然后是一个过滤器，可以增加对比度，使角色更加清晰。

What sort of image processing techniques would improve the accuracy? I've been using a Gaussian blur to smooth out the pixellated images and seen some small improvement, but I'm hoping that there is a more specific technique that would yield better results. Say a filter that was tuned to black and white images, which would smooth out irregular edges, followed by a filter which would increase the contrast to make the characters more distinct.

对于图像处理新手的任何一般提示？

Any general tips for someone who is a novice at image processing?

图像处理，以提高tesseract OCR的准确性 [英] image processing to improve tesseract OCR accuracy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

图像处理，以提高tesseract OCR的准确性 [英] image processing to improve tesseract OCR accuracy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭