OCR:图片到文字? [英] OCR: Image to text?
问题描述
在标记为复制或重复问题之前,请先阅读整个问题.
我能做到的如下:
- 要获取图像并裁剪OCR所需的部分.
- 使用
tesseract
和leptonica
处理图像. - 将应用文档分块裁剪时,即每张图像1个字符,则可提供96%的准确性.
- 如果我不这样做,并且文档背景为白色,文本为黑色,则精度几乎相同.
- To get image and crop the desired part for OCR.
- Process the image using
tesseract
andleptonica
. - When the applied document is cropped in chunks ie 1 character per image it provides 96% of accuracy.
- If I don't do that and the document background is in white color and text is in black color it gives almost same accuracy.
例如,如果输入就是这张照片:
For example if the input is as this photo :
照片开始
照片结束
我想要的是能够使这张照片具有相同的精度.
没有生成块.
What I want is to able to get the same accuracy for this photo
without generating blocks.
我用来初始化tesseract并从图像中提取文本的代码如下:
The code I used to init tesseract and extract text from image is as below:
用于tesseract的初始化
For init of tesseract
在.h文件中
tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;
在.m文件中
tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");
要从图片中获取文字
- (void)processOcrAt:(UIImage *)image
{
[self setTesseractImage:image];
tesseract->Recognize(NULL);
char* utf8Text = tesseract->GetUTF8Text();
int conf = tesseract->MeanTextConf();
NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];
[self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
withObject:arr
waitUntilDone:YES];
free(utf8Text);
}
- (void)ocrProcessingFinished0:(NSArray *)result
{
UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
[alt show];
}
但是我无法获得正确的车牌图像输出,要么为空,要么为图像提供一些垃圾数据.
But I don't get proper output for the number plate image either it is null or it gives some garbage data for the image.
如果我使用的是第一个图像,即白色背景,文本为黑色,则输出精度为89%到95%.
And if I use the image which is the first one ie white background with text as black then the output is 89 to 95% accurate.
请帮帮我.
任何建议将不胜感激.
更新
感谢@jcesar提供链接,也感谢@konstantin pribluda提供有价值的信息和指南.
Thanks to @jcesar for providing the link and also to @konstantin pribluda to provide valuable information and guide.
我能够(几乎)将图像转换为适当的黑白形式.因此对所有图像的识别效果都更好:)
I am able to convert images in to proper black and white form (almost). and so the recognition is better for all images :)
需要适当的图像二值化帮助.任何想法将不胜感激
Need help with proper binarization of images. Any Idea will be appreciated
推荐答案
谢谢您的答复,从所有答复中我都可以得出以下结论:
Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:
- 我需要获得其中包含车牌的唯一一个裁剪图像块.
- 使用我在此处提供的方法获得的数据,从那张盘子中找出数字部分的一部分. .
- 然后使用通过上述方法找到的RGB数据将图像数据转换为几乎黑白图像.
- 然后使用此处提供的方法将数据转换为图像.
- I need to get the only one cropped image block with number plate contained in it.
- From that plate need to find out the portion of the number portion using the data I got using the method provided here.
- Then converting the image data to almost black and white using the RGB data found through the above method.
- Then the data is converted to the Image using the method provided here.
以上4个步骤组合为一种方法,如下所示:
Above 4 steps are combined in to one method like this as below :
-(void)getRGBAsFromImage:(UIImage*)image
{
NSInteger count = (image.size.width * image.size.height);
// First get the image into your data buffer
CGImageRef imageRef = [image CGImage];
NSUInteger width = CGImageGetWidth(imageRef);
NSUInteger height = CGImageGetHeight(imageRef);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
NSUInteger bytesPerPixel = 4;
NSUInteger bytesPerRow = bytesPerPixel * width;
NSUInteger bitsPerComponent = 8;
CGContextRef context = CGBitmapContextCreate(rawData, width, height,
bitsPerComponent, bytesPerRow, colorSpace,
kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
CGColorSpaceRelease(colorSpace);
CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
CGContextRelease(context);
// Now your rawData contains the image data in the RGBA8888 pixel format.
int byteIndex = 0;
for (int ii = 0 ; ii < count ; ++ii)
{
CGFloat red = (rawData[byteIndex] * 1.0) ;
CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
CGFloat blue = (rawData[byteIndex + 2] * 1.0) ;
CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;
NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
{
red = 255.0;
green = 255.0;
blue = 255.0;
alpha = 255.0;
// all value set to 255 to get white background.
}
rawData[byteIndex] = red;
rawData[byteIndex + 1] = green;
rawData[byteIndex + 2] = blue;
rawData[byteIndex + 3] = alpha;
byteIndex += 4;
}
colorSpace = CGColorSpaceCreateDeviceRGB();
CGContextRef bitmapContext = CGBitmapContextCreate(
rawData,
width,
height,
8, // bitsPerComponent
4*width, // bytesPerRow
colorSpace,
kCGImageAlphaNoneSkipLast);
CFRelease(colorSpace);
CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);
UIImage *img = [UIImage imageWithCGImage:cgImage];
//use the img for further use of ocr
free(rawData);
}
注意:
此方法的唯一缺点是所消耗的时间以及将RGB值转换为白色而将其他值转换为黑色的情况.
The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.
更新:
CGImageRef imageRef = [plate CGImage];
CIContext *context = [CIContext contextWithOptions:nil]; // 1
CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
UIImage *img = [UIImage imageWithCGImage:cgImage];
只需用此代码替换上面方法的(getRGBAsFromImage:
)代码,结果是相同的,但是所花费的时间仅为0.1到0.3秒.
Just replace the above method's(getRGBAsFromImage:
) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.
这篇关于OCR:图片到文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!