使用leptonica进行OCR的图像处理(反色文本) [英] Image processing for OCR with leptonica (inverse color text)
问题描述
我正在尝试使用leptonica处理以下图像以使用tesseract提取文本。
I am trying to process the following image with leptonica to extract text with tesseract.
原始图片:
原始图像上的Tesseract产生:
Tesseract on the original image yields this:
i s l
D2J1FiiE-l191x1iitmwii9 uhiaiislz-2 Q ~37
Bottom linez
With a little time!
you can learn social media technology
using free online resources-
And if you donity
youlll be at a significant disadvantage
to
other HOn-pFOiiTS-
不太好,特别是顶级背景。所以使用leptionica我使用背景去除算法(模糊,差异,阈值,反转)来获得以下图像:
Not great, especially the top background. So using leptionica I use a background removal algorithm (blur, difference, threshold, invert) to get the following image:
但是tesseract并没有做得很好:
But tesseract doesn't do a good job with it:
@@r-mair lkrm@W lh@w ilr@ mJs@ iklh@ ii@c2lhm1@ll
mm Mime
VWU1 a Mitt-Jle time-
@1m ll@@Wn Om @@@lh1
using free onhne resources-
Andifyoudoni
9110 ate a $0 D
to other non-profrts
I
现在的主要问题是,现在所有文本都是概述而不是实体。如何调整我的算法或我可以添加什么来使文本稳定?
The main problem, it seems, is that now all of the text is outlined instead of solid. How can I adjust my algorithm or what can I add to made the text solid?
推荐答案
本文似乎提出了二值化解决问题的方法:
It seems that this paper proposes a binarization method which solves your problem:
T Kasar,J Kumar和AG Ramakrishnan。 字体和背景颜色独立文本二值化。 (2007)
T Kasar, J Kumar and A G Ramakrishnan. Font and Background Color Independent Text Binarization. (2007)
这篇关于使用leptonica进行OCR的图像处理(反色文本)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!