OCR:图片到文字? [英] OCR: Image to text?

查看:77
本文介绍了OCR:图片到文字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在标记为复制或重复问题之前,请先阅读整个问题.

我能做到的如下:

  1. 要获取图像并裁剪OCR所需的部分.
  2. 使用tesseractleptonica处理图像.
  3. 将应用文档分块裁剪时,即每张图像1个字符,则可提供96%的准确性.
  4. 如果我不这样做,并且文档背景为白色,文本为黑色,则精度几乎相同.
  1. To get image and crop the desired part for OCR.
  2. Process the image using tesseract and leptonica.
  3. When the applied document is cropped in chunks ie 1 character per image it provides 96% of accuracy.
  4. If I don't do that and the document background is in white color and text is in black color it gives almost same accuracy.

例如,如果输入就是这张照片:

For example if the input is as this photo :

照片开始

照片结束

我想要的是能够使这张照片具有相同的精度.
没有生成块.

What I want is to able to get the same accuracy for this photo
without generating blocks.

我用来初始化tesseract并从图像中提取文本的代码如下:

The code I used to init tesseract and extract text from image is as below:

用于tesseract的初始化

For init of tesseract

在.h文件中

tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;

在.m文件中

tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");

要从图片中获取文字

- (void)processOcrAt:(UIImage *)image
{
    [self setTesseractImage:image];

    tesseract->Recognize(NULL);
    char* utf8Text = tesseract->GetUTF8Text();
    int conf = tesseract->MeanTextConf();

    NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];

    [self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
                           withObject:arr
                        waitUntilDone:YES];
    free(utf8Text);
}

- (void)ocrProcessingFinished0:(NSArray *)result
{
    UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
   [alt show];
}

但是我无法获得正确的车牌图像输出,要么为空,要么为图像提供一些垃圾数据.

But I don't get proper output for the number plate image either it is null or it gives some garbage data for the image.

如果我使用的是第一个图像,即白色背景,文本为黑色,则输出精度为89%到95%.

And if I use the image which is the first one ie white background with text as black then the output is 89 to 95% accurate.

请帮帮我.

任何建议将不胜感激.

更新

感谢@jcesar提供链接,也感谢@konstantin pribluda提供有价值的信息和指南.

Thanks to @jcesar for providing the link and also to @konstantin pribluda to provide valuable information and guide.

我能够(几乎)将图像转换为适当的黑白形式.因此对所有图像的识别效果都更好:)

I am able to convert images in to proper black and white form (almost). and so the recognition is better for all images :)

需要适当的图像二值化帮助.任何想法将不胜感激

Need help with proper binarization of images. Any Idea will be appreciated

推荐答案

谢谢您的答复,从所有答复中我都可以得出以下结论:

Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:

  1. 我需要获得其中包含车牌的唯一一个裁剪图像块.
  2. 使用我在此处提供的方法获得的数据,从那张盘子中找出数字部分的一部分. .
  3. 然后使用通过上述方法找到的RGB数据将图像数据转换为几乎黑白图像.
  4. 然后使用此处提供的方法将数据转换为图像.
  1. I need to get the only one cropped image block with number plate contained in it.
  2. From that plate need to find out the portion of the number portion using the data I got using the method provided here.
  3. Then converting the image data to almost black and white using the RGB data found through the above method.
  4. Then the data is converted to the Image using the method provided here.

以上4个步骤组合为一种方法,如下所示:

Above 4 steps are combined in to one method like this as below :

-(void)getRGBAsFromImage:(UIImage*)image
{
    NSInteger count = (image.size.width * image.size.height);
    // First get the image into your data buffer
    CGImageRef imageRef = [image CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);

    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);

    // Now your rawData contains the image data in the RGBA8888 pixel format.
    int byteIndex = 0;
    for (int ii = 0 ; ii < count ; ++ii)
    {
        CGFloat red   = (rawData[byteIndex]     * 1.0) ;
        CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
        CGFloat blue  = (rawData[byteIndex + 2] * 1.0) ;
        CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;

        NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
        if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
        {
            red = 255.0;
            green = 255.0;
            blue = 255.0;
            alpha = 255.0;
            // all value set to 255 to get white background.
        }
        rawData[byteIndex] = red;
        rawData[byteIndex + 1] = green;
        rawData[byteIndex + 2] = blue;
        rawData[byteIndex + 3] = alpha;

        byteIndex += 4;
    }

    colorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef bitmapContext = CGBitmapContextCreate(
                                                       rawData,
                                                       width,
                                                       height,
                                                       8, // bitsPerComponent
                                                       4*width, // bytesPerRow
                                                       colorSpace,
                                                       kCGImageAlphaNoneSkipLast);

    CFRelease(colorSpace);

    CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);

    UIImage *img = [UIImage imageWithCGImage:cgImage];

    //use the img for further use of ocr

    free(rawData);
}

注意:

此方法的唯一缺点是所消耗的时间以及将RGB值转换为白色而将其他值转换为黑色的情况.

The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.

更新:

    CGImageRef imageRef = [plate CGImage];
    CIContext *context = [CIContext contextWithOptions:nil]; // 1
    CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
    CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
    CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
    CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
    UIImage *img = [UIImage imageWithCGImage:cgImage]; 

只需用此代码替换上面方法的(getRGBAsFromImage:)代码,结果是相同的,但是所花费的时间仅为0.1到0.3秒.

Just replace the above method's(getRGBAsFromImage:) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.

这篇关于OCR:图片到文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆