如何为Tesseract 4.1.0创建Traineddata文件 [英] How to Create Traineddata file For Tesseract 4.1.0

查看:604
本文介绍了如何为Tesseract 4.1.0创建Traineddata文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想识别NumberPlate的字符. 如何在Ubuntu 16.04中训练相应车牌的tesseract-ocr. 由于我不熟悉培训.请帮助我创建一个"traineddata"文件来识别车牌.

I want to recognise the characters of NumberPlate. How to train the tesseract-ocr for respective number plate in ubuntu 16.04. Since i don't familiar with training. Please help me to create a 'traineddata' file for recognizing numberplate.

我有1000张车牌图像.

I have 1000 images of number plate.

请调查一下. 任何帮助将不胜感激.

Please look into it. Any help would be appreciate.

所以我尝试了以下命令

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox

tesseract eng.arial.plate3655.png eng.arial.plate3655 batch.nochop makebox

但是它给出了错误.

Tesseract Open Source OCR Engine v4.1.0-rc1-56-g7fbd with Leptonica
Error, cannot read input file eng.arial.plate3655.png: No such file or directory
Error during processing.

那之后我尝试了

tesseract plate4.png eng.arial.plate4 batch.nochop makebox

它可以工作,但在某些板块中. 现在在第2步中.我遇到了错误.

it works but in some plates. Now in Step 2. I am getting error.

屏幕截图已附上.

用于训练的板4图像

第1步和Ste p2显示在终端上

Step 1 and Ste p2 display in terminal

在第1步和第2步之后生成的文件

File Generated after step 1 and step 2

在步骤1和步骤2之后生成的文件内容

Content of file generated after step 1 and step 2

推荐答案

为Tesseract 4创建.traineddata

{*注意:安装tesseract之后,打开cmd并执行以下操作.}

{*Note : After install tesseract open cmd and do the following.}

第1步: 为要训练的图像制作框文件

Step 1: Make box files for images that we want to train

语法:

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox

例如:

tesseract own.arial.exp0.jpg own.arial.exp0 batch.nochop makebox

{*注意:制作Box文件后,我们必须更改或修改Box文件中错误识别的字符.}

{*Note:After making box files we have to change or modify wrongly identified characters in box files.}

第2步: 创建.tr文件(复合图像文件和框文件)

Step 2: Create .tr file (Compounding image file and box file)

语法:

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train

例如: tesseract own.arial.exp0.jpg own.arial.exp0 box.train

Eg: tesseract own.arial.exp0.jpg own.arial.exp0 box.train

步骤3: 从框文件中提取字符集(此命令的输出为unicharset文件)

step 3: Extract the charset from the box files (Output for this command is unicharset file)

语法:

unicharset_extractor [langname].[fontname].[expN].box 

例如:

unicharset_extractor  own.arial.exp0.box

步骤4: 根据我们的需求创建一个font_properties文件.

step 4: Create a font_properties file based on our needs.

语法:

echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" > font_properties 

例如:

echo "arial 0 0 1 0 0" > font_properties

第5步: 训练数据.

Step 5: Training the data.

语法:

mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr

例如:

mftraining -F font_properties -U unicharset -O own.unicharset own.arial.exp0.tr

第6步:

语法:

cntraining [langname].[fontname].[expN].tr

例如:

cntraining own.arial.exp0.tr

{*注意:在第5步和第6步之后,创建了四个文件.(shapetable,inttemp,pffmtable,normproto)}

{*Note:After step 5 and step 6 four files were created.(shapetable,inttemp,pffmtable,normproto) }

第7步: 重命名四个文件(shapetable,inttemp,pffmtable,normproto)到([langname] .shapetable,[langname] .inttemp,[langname] .pffmtable,[langname] .normproto)

Step 7: Rename four files (shapetable,inttemp,pffmtable,normproto) into ([langname].shapetable,[langname].inttemp,[langname].pffmtable,[langname].normproto)

语法:

rename filename1 filename2

例如:

    rename shapetable own.shapetable
    rename inttemp own.inttemp
    rename pffmtable own.pffmtable
    rename normproto own.normproto

步骤8: 创建.traineddata文件

Step 8: Create .traineddata file

语法:

combine_tessdata [langname].

例如:

combine_tessdata own.

{*注意:我将只使用一张图像exp0来创建训练好的数据.如果您要训练一张以上的图像,则可以训练,即exp1,exp2..expn}

{ *Note : I will use only one image exp0 for creating traineddata.if you want to train more than one image you can train i.e exp1,exp2..expn }

参考

这篇关于如何为Tesseract 4.1.0创建Traineddata文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆