c#OCR无法识别数字(tesseract 2) [英] c# OCR can't recognize digits (tesseract 2)

查看:432
本文介绍了c#OCR无法识别数字(tesseract 2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从以下内容中提取数字:

I'm trying to extract digits from the following:

失败,我得到一个作为回报.我使用的是Google的tesseract 2,使用的是C#(开放源代码的C#包装器),现在我想知道这张图片是否太烂而无法用于OCR?

It fails, I get a ~ in return. I'm using google's tesseract 2, using C# (open source c# wrapper) and now I'm wondering, is this image too crappy to be used for OCR?

因为恕我直言,这些数字很清晰.

Because imho the digits are straight clear.

您是否还有其他OCR引擎可以确定这一点?

Do you have any other OCR engine in mind that would nail this down?

编辑

我还尝试了 Asprise OCR ( http://asprise.com/product/ocr/selector.php ),但它也无法解析图像...

I've also tried with Asprise OCR (http://asprise.com/product/ocr/selector.php) but it fails to parse the image too...

推荐答案

我建议调整大小.我在IE中将该页面缩放到200%,截取了屏幕截图,将其打印为PDF,并将其导入到使用tessnet的程序中.苔丝钉了它!除非我看错了#:-)

I suggest resizing. I zoomed this page to 200% in IE, Took a screenshot, printed it to PDF and imported it into my program that uses tessnet. Tess nailed it! Unless I read the #s wrong :-)

尽管置信度= 140(如果您想知道,则首选100以下).当然,当我尝试原始大小时,我没得到〜;我正确地获得了大约##的1/2,一堆字母和其他垃圾.不够好,但是更好.

Although confidence = 140 (under 100 is preferred if you wondered). Of course When i tried the original size, I didn't get ~; I got about 1/2 the #s right, a bunch of letters, and other garbage. Not good enough, but better.

t2似乎喜欢一定尺寸的图像.

t2 seems to like images a certain size.

我的程序进行处理以使其正常工作.建议使用.net GDI +转换为32位,并以插值模式高质量双三次"调整大小.这似乎填补了空白".

My program does processing to get that to work. Suggest using .net GDI+ for converting to 32 bit, resizing with Interpolation mode High Quality Bicubic. This seems to 'fill in the gaps' a bit.

以合适的尺寸进行游戏-我发现尺寸太大或太小,而tesseract的表现都不同.

Play with sizes that work - I have found, too big, or too small, and tesseract performs differently.

两个问题都是预处理,这很容易,您会希望tesseract尝试;但是,我知道如何调整大小和进行插值;我不知道如何OCR!所以我愿意解决.

Both issues are preprocessing, that's easy and you'd thing tesseract would try; however, I know how to resize and interpolate; I don't know how to OCR! So I am willing to settle.

这篇关于c#OCR无法识别数字(tesseract 2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆