OCR由于字体细节而失败 [英] OCR fails due to font specifics
问题描述
我有一个包含所有字体字符(Arial在我的情况下)的库。例如:
我使用这个库来OCR文字从图像。
问题是到OCR,诸如j,/,t字符可以彼此重叠!因此,OCR现在是不可能的,因为字符不匹配模式图像(最多3个像素不同) / p>
我如何处理这个问题?有没有更好的方法来比较图像? (C#,WinForms app)
我使用此方法进行比较:
unsafe public static bool CompareMemCmp(Bitmap b1,Bitmap b2)
{
if((b1 == null)!
if(b1.Size!= b2.Size)return false;
var bd1 = b1.LockBits(new Rectangle(new System.Drawing.Point(0,0),b1.Size),ImageLockMode.ReadOnly,PixelFormat.Format32bppArgb);
var bd2 = b2.LockBits(new Rectangle(new System.Drawing.Point(0,0),b2.Size),ImageLockMode.ReadOnly,PixelFormat.Format32bppArgb);
try
{
IntPtr bd1scan0 = bd1.Scan0;
IntPtr bd2scan0 = bd2.Scan0;
int stride = bd1.Stride;
int len = stride * b1.Height;
return memcmp(bd1scan0,bd2scan0,len)== 0;
}
finally
{
b1.UnlockBits(bd1);
b2.UnlockBits(bd2);
}
}
这是非常快速和可靠的..但你不能得到不幸的是。
您可以创建这些字符对(可能有不合理的金额的人虽然..)字符ie。 -j组合将被识别为-j字符。
I have a library which contains all font characters (Arial in my case). For example:
I'm using this library to OCR text from image.
The problem is that when you try to OCR such characters as "j", "/", "t" - characters could overlap one another! So OCR is now impossible, because characters do not match pattern images (up to 3 pixels are different).
How do I have to deal with this problem? Is there a better way to compare images? (C#, WinForms app)
I'm using this method for comparison:
unsafe public static bool CompareMemCmp(Bitmap b1, Bitmap b2)
{
if ((b1 == null) != (b2 == null)) return false;
if (b1.Size != b2.Size) return false;
var bd1 = b1.LockBits(new Rectangle(new System.Drawing.Point(0, 0), b1.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
var bd2 = b2.LockBits(new Rectangle(new System.Drawing.Point(0, 0), b2.Size), ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
try
{
IntPtr bd1scan0 = bd1.Scan0;
IntPtr bd2scan0 = bd2.Scan0;
int stride = bd1.Stride;
int len = stride * b1.Height;
return memcmp(bd1scan0, bd2scan0, len) == 0;
}
finally
{
b1.UnlockBits(bd1);
b2.UnlockBits(bd2);
}
}
It's extremely fast and reliable.. but you cant get a result if condition from above is met.. unfortunately.
You could make these character pairs (there could be an unreasonable amount of them though..) "characters" ie. the "-j" combination would be recognized as "-j" character..
这篇关于OCR由于字体细节而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!