如何从图像中分割噪声和文本以进行OCR的预处理 [英] How to split noise and text from the image for preprocessing of OCR

查看:627
本文介绍了如何从图像中分割噪声和文本以进行OCR的预处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在电视镜头中对字幕应用OCR。 (我正在使用Tesseact 3.xw / C ++)我试图将文本和背景部分拆分为OCR的预处理。

I am applying OCR against subtitle in TV footage. (I am using Tesseact 3.x w/ C++) I am trying to split text and background part as a preprocessing of OCR.

这是原始图像:

并预处理图片:

OCR结果为:Sicemn clone

The OCR result is: Sicemn clone

正如上面显示的预处理图像所示,信件周围留有一些雾,阻止OCR模块正常工作。

As the above preprocessed image shown, there're some "fog" remained around the letter which prevents OCR module to do their job properly.

有没有办法以编程方式识别那些雾来删除,或者做一些图像处理以从预处理的图像中删除/减少它?

Is there any way to recognize those "fog" programatically to remove, or do some image processing to remove/reduce it from the preprocessed image?

由于预处理逻辑经过大量优化处理差异不同的图像,我宁愿找到一种方法来清理预处理的图像,而不是修改预处理的逻辑(因为优化到这个图片会影响其他图片)

Since preprocessed logic is heavily optimized to handle different images, I rather want to find a way to "clean" the preprocessed image, than modifying preprocessed logic (since optimizing to this pics can affecting to other pics)

任何建议非常欢迎。

更新

显然,sixela的答案很棒,并且适用于大多数情况。
它不起作用的情况是背景还包括类似的文字颜色

Apparently, sixela's answer is great, and will work with most of the case. The case it does not work is background also include similar color of text

不工作案例:

结果示例:

看起来,高斯滤波器似乎会在这类镜头中引起问题。
这意味着,不同的镜头可能需要不同的方法。

Seemingly, Gaussian filter seems to cause a problem in this types of footage. This implies, different footage may requires different approach.

推荐答案

我设法有一个更清晰(不完美)的图像通过使用形态学操作和阈值处理。

I managed to have a clearer (not perfect) image by using morphological operations and thresholding.

以下是如何:


  1. 我开始通过灰度转换原始图像

  2. 应用高斯模糊(9x9内核)去除灰度图像去噪

  3. Top Hat形态操作(3x3内核)获取白色文本

  4. Otsu阈值处理方法

  5. dilation

  6. 反转二进制阈值以获取白色文本黑色

  1. I started by converting the original image in greyscale
  2. Applied a gaussian Blur (9x9 kernel) to denoise the greyscale image
  3. Top Hat Morphological operation (3x3 kernel)to get the white text
  4. Otsu thresholding method
  5. dilation
  6. Inverted binary threshold to get the white text in black

我终于获得了以下图片

作为OCR结果,这个文字给出了这样的文字:自从你不知道

Which gives, as OCR results, this text: "Since vou don'k"

PS:Thi结果当然可以通过调整参数(例如内核大小)来改进,但我希望它可以指导你。我在Python中使用OpenCv来快速尝试这些方法。

PS: This result can of course be improved by tweaking the parameters (kernel size for example) but i hope it can guide you. I used OpenCv in Python to quickly try out those methods.

import cv2

image = cv2.imread('./inputImg.png', 0)
imgBlur = cv2.GaussianBlur(image, (9, 9), 0)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
imgTH = cv2.morphologyEx(imgBlur, cv2.MORPH_TOPHAT, kernel)
_, imgBin = cv2.threshold(imgTH, 0, 250, cv2.THRESH_OTSU)

imgdil = cv2.dilate(imgBin, kernel)
_, imgBin_Inv = cv2.threshold(imgdil, 0, 250, cv2.THRESH_BINARY_INV)

cv2.imshow('original', image)
cv2.imshow('bin', imgBin)
cv2.imshow('dil', imgdil)
cv2.imshow('inv', imgBin_Inv)

cv2.imwrite('./output.png', imgBin_Inv)
cv2.waitKey(0)

在此之后,我尝试使用此命令在Tesseract上输出图像:

After this i tried the output image on Tesseract with this command:

tesseract output.png stdout

这篇关于如何从图像中分割噪声和文本以进行OCR的预处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆