使用leptonica进行OCR的图像处理(反色文本) [英] Image processing for OCR with leptonica (inverse color text)

查看:470
本文介绍了使用leptonica进行OCR的图像处理(反色文本)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用leptonica处理以下图像以使用tesseract提取文本。

I am trying to process the following image with leptonica to extract text with tesseract.

原始图片:

原始图像上的Tesseract产生:

Tesseract on the original image yields this:

i s l
D2J1FiiE-l191x1iitmwii9 uhiaiislz-2 Q ~37
Bottom linez
With a little time!
you can learn social media technology
using free online resources-
And if you donity
youlll be at a significant disadvantage
to
other HOn-pFOiiTS-

不太好,特别是顶级背景。所以使用leptionica我使用背景去除算法(模糊,差异,阈值,反转)来获得以下图像:

Not great, especially the top background. So using leptionica I use a background removal algorithm (blur, difference, threshold, invert) to get the following image:

但是tesseract并没有做得很好:

But tesseract doesn't do a good job with it:

@@r-mair lkrm@W lh@w ilr@ mJs@ iklh@ ii@c2lhm1@ll
mm Mime
VWU1 a Mitt-Jle time-
@1m ll@@Wn Om @@@lh1
using free onhne resources-
Andifyoudoni
9110 ate a $0 D
to other non-profrts
I

现在的主要问题是,现在所有文本都是概述而不是实体。如何调整我的算法或我可以添加什么来使文本稳定?

The main problem, it seems, is that now all of the text is outlined instead of solid. How can I adjust my algorithm or what can I add to made the text solid?

推荐答案

本文似乎提出了二值化解决问题的方法:

It seems that this paper proposes a binarization method which solves your problem:

T Kasar,J Kumar和AG Ramakrishnan。 字体和背景颜色独立文本二值化。 (2007)

T Kasar, J Kumar and A G Ramakrishnan. Font and Background Color Independent Text Binarization. (2007)

这篇关于使用leptonica进行OCR的图像处理(反色文本)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆