OCR由于字体细节而失败 [英] OCR fails due to font specifics

查看:110
本文介绍了OCR由于字体细节而失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含所有字体字符(Arial在我的情况下)的库。例如:



我使用这个库来OCR文字从图像。



问题是到OCR,诸如j,/,t字符可以彼此重叠!因此,OCR现在是不可能的,因为字符不匹配模式图像(最多3个像素不同) / p>



我如何处理这个问题?有没有更好的方法来比较图像? (C#,WinForms app)



我使用此方法进行比较:

  unsafe public static bool CompareMemCmp(Bitmap b1,Bitmap b2)
{
if((b1 == null)!
if(b1.Size!= b2.Size)return false;

var bd1 = b1.LockBits(new Rectangle(new System.Drawing.Point(0,0),b1.Size),ImageLockMode.ReadOnly,PixelFormat.Format32bppArgb);
var bd2 = b2.LockBits(new Rectangle(new System.Drawing.Point(0,0),b2.Size),ImageLockMode.ReadOnly,PixelFormat.Format32bppArgb);

try
{
IntPtr bd1scan0 = bd1.Scan0;
IntPtr bd2scan0 = bd2.Scan0;

int stride = bd1.Stride;
int len = stride * b1.Height;

return memcmp(bd1scan0,bd2scan0,len)== 0;
}
finally
{
b1.UnlockBits(bd1);
b2.UnlockBits(bd2);
}
}

这是非常快速和可靠的..但你不能得到不幸的是。

解决方案

您可以创建这些字符对(可能有不合理的金额的人虽然..)字符ie。 -j组合将被识别为-j字符。


I have a library which contains all font characters (Arial in my case). For example:

I'm using this library to OCR text from image.

The problem is that when you try to OCR such characters as "j", "/", "t" - characters could overlap one another! So OCR is now impossible, because characters do not match pattern images (up to 3 pixels are different).

How do I have to deal with this problem? Is there a better way to compare images? (C#, WinForms app)

I'm using this method for comparison:

unsafe public static bool CompareMemCmp(Bitmap b1, Bitmap b2)
    {
        if ((b1 == null) != (b2 == null)) return false;
        if (b1.Size != b2.Size) return false;

        var bd1 = b1.LockBits(new Rectangle(new System.Drawing.Point(0, 0), b1.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
        var bd2 = b2.LockBits(new Rectangle(new System.Drawing.Point(0, 0), b2.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);

        try
        {
            IntPtr bd1scan0 = bd1.Scan0;
            IntPtr bd2scan0 = bd2.Scan0;

            int stride = bd1.Stride;
            int len = stride * b1.Height;

            return memcmp(bd1scan0, bd2scan0, len) == 0;
        }
        finally
        {
            b1.UnlockBits(bd1);
            b2.UnlockBits(bd2);
        }
    }

It's extremely fast and reliable.. but you cant get a result if condition from above is met.. unfortunately.

解决方案

You could make these character pairs (there could be an unreasonable amount of them though..) "characters" ie. the "-j" combination would be recognized as "-j" character..

这篇关于OCR由于字体细节而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆