Tesseract OCR无法识别除法符号“÷". [英] Tesseract OCR won't recognize division symbol "÷"

查看:83
本文介绍了Tesseract OCR无法识别除法符号“÷".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将iOS 8中的Tesseract用于基于OCR的应用程序,但是它错误地将图像中的÷"符号转换为加号"+".

I am using Tesseract in iOS 8 for an OCR based app but it incorrectly converts the division "÷" symbol in the image to a plus "+" sign.

例如,这张图片

始终转换为文本字符串"8 + 4 + 4".应该是"8 + 4÷4".

always converts to the text string "8+4+4". It should be "8+4÷4".

我尝试使用其他训练有素的数据语言文件"eng + equ","ita",在白名单中添加÷",将ocr_engine变量设置为cube,将图像转换为灰度或黑色&白色,将图像放大2到4倍.

I've tried using different trained data language files "eng+equ", "ita", adding "÷" to the whitelist, setting the ocr_engine variable to cube, converting image to grayscale or black & white, upsizing the image by 2 and 4 times.

我尝试过的所有操作始终返回加号"+"而不是除号÷".

Everything I've tried always returns a plus "+" sign instead of a division "÷" symbol.

我尝试仅使用经过训练的"equ"数据文件,并且DOES正确返回了分隔符号-但是所有其他字符都将变成垃圾.

I tried using only the "equ" trained data file and that DOES return the division symbol correctly - but all other characters are then garbage.

我已经研究了好几天(Google,Stackoverflow),无法解决.

I've been looking into this (Google, Stackoverflow) for several days and cannot figure it out.

如何使Tesseract包含并识别÷"除法符号?

How do I get Tesseract to include and recognize the division "÷" symbol?

更新:

我能做的最好的就是将AVCaptureSession预设设置为高

The best I have been able to do is to set the AVCaptureSession preset to high

AVCaptureSession *session = [[AVCaptureSession alloc] init];
session.sessionPreset = AVCaptureSessionPresetHigh;

所捕获的尺寸大于676××405像素的图像.使用Tesseract OCR UIImage类别(图像称为源")对图像进行二值化:

The captured image above dimensions are then 676 × 405 pixels. Using Tesseract OCR UIImage category (image is named 'source') to binarize the image:

// Binarize the source image to improve contrast (using the UIImage category provided by TesseractOCR)
UIImage *blackAndWhiteImage = [source blackAndWhite];
[self.tesseract setImage:blackAndWhiteImage];

这通常会将除法符号转换为文本"-1-",但是我已经看到-:-"以及减号之间的其他数字和大写字符.

This will usually convert the division symbol to the text "-1-", but I've seen "-:-" and other numbers and uppercase characters between the minus signs.

我可以在返回的文本中进行检查.但是,那么就不可能知道是否将返回的文本"8-1-2"视为真正的减法或也许"除法.

I can check for that in the returned text. But then it is impossible to know whether to treat the returned text "8-1-2" as a true subtraction or 'maybe' division.

推荐答案

训练不同的字体或引擎字体.

Train the or engine wit different fonts.

此处是训练引擎的工具.看看

Here is the tool for training the engine. Have a look on this also

或者您可以使用 JTessBoxEditor

这篇关于Tesseract OCR无法识别除法符号“÷".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆