使用 Tesseract 检测黑色背景上的白色字符 [英] Detect white characters on black background using Tesseract

查看:74
本文介绍了使用 Tesseract 检测黑色背景上的白色字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Tesseract OCR 完全陌生.这个问题可能很简单,但我似乎无法使用 Google 找到答案.

基本上,我有一个包含两部分的图像:第一部分,位于图像的顶部,黑色背景和白色文本;第二部分位于图片的底部,背景为白色,文字为黑色.

我在图像上运行了 tesseract,它正确识别了底部的所有字符,但没有识别顶部的所有字符.我敢肯定,顶部的字符非常清晰,应该很容易被 Tesseract 识别.唯一的区别是它有黑色背景.

有没有办法使用 Tesseract 同时识别黑白背景的文本?

解决方案

T. Kasar、J. Kumar 和 A. G. Ramakrishnan 的一篇论文描述了该问题的一种解决方案:字体和背景颜色无关的文本二值化".该论文可以在这里找到.Jason Funk 有一个算法的实现.他的实现可以在这里找到.我在算法上取得了一些成功.我认为这种类型的解决方案正是您正在寻找的.

您可能还会发现查看最近提出的有关背景删除的问题很有帮助(OCR 的 OpenCV:如何计算灰度图像 OCR 的阈值水平) 及其答案.您可以通过背景颜色分离感兴趣的区域,然后将每个区域交给 tesseract 进行处理.或者,在二值化后,您可以反转图像黑色背景部分中的 8x8 像素区域(在上面的答案中描述)(或反之)以创建统一的背景.

最后,通过搜索车牌识别问题(或车牌)的解决方案,您可能会找到一些有用的信息.许多车牌(车牌)都有可能干扰识别的背景图像或照明伪影.更普遍的问题是背景去除.

I'm completely new to Tesseract OCR. This problem might be simple but I can't seem to find the answer using Google.

Basically, I have an image that contains two parts: the first part, which is at the top of the image, has a black background with texts in white color; the second part, which is at the bottom of the image, has white background with texts in black color.

I ran tesseract on the image, which correctly recognized all characters in the bottom part, but none in the top part. I am sure that the characters on the top part is very clear and should be easy to recognize by Tesseract. The only difference is that it has black background.

Is there a way to use Tesseract to recognize texts in both black and white background at the same time?

解决方案

A paper by T. Kasar, J. Kumar, and A. G. Ramakrishnan describes one solution to the problem: "Font and Background Color Independent Text Binarization". The paper can be found here. There is an implementation of the algorithm by Jason Funk. His implementation can be found here. I have had some success with the algorithm. I think this type of solution is what you are looking for.

You might also find it helpful to review this recently asked question on background removal (OpenCV for OCR: How to compute thresholding levels for gray image OCR) and its answer. You may be able separate regions of interest by background color and then hand each region to tesseract for processing. Alternatively, post binarization you could invert the 8x8 pixel regions (described in answer above) in the black background portion of the image (or vice versus) to create a uniform background.

Finally, you may find some useful information by searching for solutions to the number plate recognition problem (or license plates). Many number plates (license plates) have background images or lighting artifacts that can interfere with recognition. The more general problem is background removal.

这篇关于使用 Tesseract 检测黑色背景上的白色字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆